]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
7 years agommc: sdhci: Add version V4 definition
Chunyan Zhang [Thu, 30 Aug 2018 08:21:37 +0000 (16:21 +0800)]
mmc: sdhci: Add version V4 definition

Added definitions for v400, v410, v420.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: fix inconsistent IS_ERR and PTR_ERR
YueHaibing [Tue, 4 Sep 2018 02:59:09 +0000 (10:59 +0800)]
mmc: tegra: fix inconsistent IS_ERR and PTR_ERR

Fix inconsistent IS_ERR and PTR_ERR in tegra_sdhci_init_pinctrl_info,
the proper pointer to be passed as argument is 'pinctrl_state_1v8'

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Aapo Vienamo <aapo.vienamo@iki.fi>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Implement periodic pad calibration
Aapo Vienamo [Mon, 20 Aug 2018 09:23:33 +0000 (12:23 +0300)]
mmc: tegra: Implement periodic pad calibration

Rerun the pad calibration procedure before sdhci_request() if
the 100 ms recalibration interval has been exceeded.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Implement HS400 delay line calibration
Aapo Vienamo [Fri, 10 Aug 2018 18:14:01 +0000 (21:14 +0300)]
mmc: tegra: Implement HS400 delay line calibration

Implement HS400 specific delay line calibration procedure. This is a
Tegra specific procedure and has to be performed regardless whether
enhanced strobe or HS400 tuning is used.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Implement HS400 enhanced strobe
Aapo Vienamo [Fri, 10 Aug 2018 18:14:00 +0000 (21:14 +0300)]
mmc: tegra: Implement HS400 enhanced strobe

Implement eMMC HS400 enhanced strobe. Enhanced strobe is an alternative
mechanism to the HS400 tuning procedure.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Parse and program DQS trim value
Aapo Vienamo [Fri, 10 Aug 2018 18:13:59 +0000 (21:13 +0300)]
mmc: tegra: Parse and program DQS trim value

Parse and program the HS400 DQS trim value from DT. Program a fallback
value in case the property is missing.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agodt-bindings: mmc: Add DQS trim value to Tegra SDHCI
Aapo Vienamo [Fri, 10 Aug 2018 18:13:58 +0000 (21:13 +0300)]
dt-bindings: mmc: Add DQS trim value to Tegra SDHCI

Document HS400 DQS trim value device tree property.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Enable UHS and HS200 modes for Tegra186
Aapo Vienamo [Thu, 30 Aug 2018 15:06:28 +0000 (18:06 +0300)]
mmc: tegra: Enable UHS and HS200 modes for Tegra186

Set nvquirks to enable higher speed modes.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Enable UHS and HS200 modes for Tegra210
Aapo Vienamo [Thu, 30 Aug 2018 15:06:27 +0000 (18:06 +0300)]
mmc: tegra: Enable UHS and HS200 modes for Tegra210

Set nvquirks to enable higher speed modes.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Disable card clock during tuning cmd on Tegra210
Aapo Vienamo [Thu, 30 Aug 2018 15:06:26 +0000 (18:06 +0300)]
mmc: tegra: Disable card clock during tuning cmd on Tegra210

Implement tegra210_sdhci_writew() to disable card clock and issue a
reset when the tuning command is sent. This is done to prevent an
intermittent hang with around 10 % failure rate during tuning.

Add tegra186_sdhci_ops because this workaround is specific to Tegra210.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Remove tegra_sdhci_writew() from tegra210_sdhci_ops
Aapo Vienamo [Thu, 30 Aug 2018 15:06:25 +0000 (18:06 +0300)]
mmc: tegra: Remove tegra_sdhci_writew() from tegra210_sdhci_ops

tegra_sdhci_writew() defers the write to SDHCI_TRANSFER_MODE until
SDHCI_COMMAND is written. This is not necessary on Tegra210 and Tegra186
and it breaks read-modify-write operations on SDHCI_TRANSFER_MODE
because writes to SDHCI_TRANSFER_MODE aren't visible until SDHCI_COMMAND
has been written to. This results in tuning failures on Tegra210.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Use standard SDHCI tuning on Tegra210 and Tegra186
Aapo Vienamo [Thu, 30 Aug 2018 15:06:24 +0000 (18:06 +0300)]
mmc: tegra: Use standard SDHCI tuning on Tegra210 and Tegra186

Add a new sdhci_ops struct for Tegra210 and Tegra186 which doesn't
set the custom tuning callback used on previous SoC generations.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Configure default trim value on reset
Aapo Vienamo [Thu, 30 Aug 2018 15:06:23 +0000 (18:06 +0300)]
mmc: tegra: Configure default trim value on reset

Program the outbound sampling trim value in tegra_sdhci_reset(). Unlike
the outbound tap value this does not depend on the signaling mode and
needs to be only programmed once.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Configure default tap values
Aapo Vienamo [Thu, 30 Aug 2018 15:06:22 +0000 (18:06 +0300)]
mmc: tegra: Configure default tap values

Set the default inbound timing adjustment tap value on reset and on
non-tunable modes.

The default tap value is not programmed on tunable modes because the
tuning sequence is used instead to determine the tap value.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Parse default trim and tap from dt
Aapo Vienamo [Thu, 30 Aug 2018 15:06:21 +0000 (18:06 +0300)]
mmc: tegra: Parse default trim and tap from dt

Parse the default inbound and outbound sampling trimmer values from
the device tree.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Add a workaround for tap value change glitch
Aapo Vienamo [Thu, 30 Aug 2018 15:06:20 +0000 (18:06 +0300)]
mmc: tegra: Add a workaround for tap value change glitch

Add quirk to disable the card clock during configuration of the tap
value in tegra_sdhci_set_tap() and issue sdhci_reset() after value
change. This is a workaround to avoid propagation of a potential
glitch caused by setting the tap value.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Enable pad calibration on Tegra210 and Tegra186
Aapo Vienamo [Thu, 30 Aug 2018 15:06:19 +0000 (18:06 +0300)]
mmc: tegra: Enable pad calibration on Tegra210 and Tegra186

Set NVQUIRK_HAS_PADCALIB on Tegra210 and Tegra186 to enable automatic
pad drive strength calibration.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Perform pad calibration after voltage switch
Aapo Vienamo [Thu, 30 Aug 2018 15:06:18 +0000 (18:06 +0300)]
mmc: tegra: Perform pad calibration after voltage switch

Run the automatic pad calibration after voltage switching if
tegra_host->pad_calib_required is set.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Program pad autocal offsets from dt
Aapo Vienamo [Thu, 30 Aug 2018 15:06:17 +0000 (18:06 +0300)]
mmc: tegra: Program pad autocal offsets from dt

Parse the pad drive strength calibration offsets from the device tree.
Program the calibration offsets in accordance with the current signaling
mode.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Disable card clock during pad calibration
Aapo Vienamo [Thu, 30 Aug 2018 15:06:16 +0000 (18:06 +0300)]
mmc: tegra: Disable card clock during pad calibration

Disable the card clock during automatic pad drive strength calibration
and re-enable it afterwards.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Power on the calibration pad
Aapo Vienamo [Thu, 30 Aug 2018 15:06:15 +0000 (18:06 +0300)]
mmc: tegra: Power on the calibration pad

Automatic pad drive strength calibration is performed on a separate pad
identical to the ones used for driving the actual bus. Power on the
calibration pad during the calibration procedure and power it off
afterwards to save power.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Set calibration pad voltage reference
Aapo Vienamo [Thu, 30 Aug 2018 15:06:14 +0000 (18:06 +0300)]
mmc: tegra: Set calibration pad voltage reference

Configure the voltage reference used by the automatic pad drive strength
calibration procedure. The value is a magic number from the TRM.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Poll for calibration completion
Aapo Vienamo [Thu, 30 Aug 2018 15:06:13 +0000 (18:06 +0300)]
mmc: tegra: Poll for calibration completion

Implement polling with 10 ms timeout for automatic pad drive strength
calibration.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tegra: Reconfigure pad voltages during voltage switching
Aapo Vienamo [Thu, 30 Aug 2018 15:06:12 +0000 (18:06 +0300)]
mmc: tegra: Reconfigure pad voltages during voltage switching

Parse the pinctrl state and nvidia,only-1-8-v properties from the device
tree. Validate the pinctrl and regulator configuration before unmasking
UHS modes. Implement pad voltage state reconfiguration in the mmc
start_signal_voltage_switch() callback. Add NVQUIRK_NEEDS_PAD_CONTROL
and add set it for Tegra210 and Tegra186.

The pad configuration is done in the mmc callback because the order of
pad reconfiguration and sdhci voltage switch depend on the voltage to
which the transition occurs.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agodt-bindings: mmc: Add Tegra SDHCI sampling trimmer values
Aapo Vienamo [Thu, 30 Aug 2018 15:06:05 +0000 (18:06 +0300)]
dt-bindings: mmc: Add Tegra SDHCI sampling trimmer values

Document the Tegra SDHCI inbound and outbound sampling trimmer values.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agodt-bindings: Add Tegra SDHCI pad pdpu offset bindings
Aapo Vienamo [Thu, 30 Aug 2018 15:06:04 +0000 (18:06 +0300)]
dt-bindings: Add Tegra SDHCI pad pdpu offset bindings

Add bindings documentation for pad pull up and pull down offset values to be
programmed before executing automatic pad drive strength calibration.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agodt-bindings: mmc: tegra: Add pad voltage control properties
Aapo Vienamo [Thu, 30 Aug 2018 15:06:03 +0000 (18:06 +0300)]
dt-bindings: mmc: tegra: Add pad voltage control properties

Document the pinctrl bindings used by the SDHCI driver to reconfigure
pad voltages on controllers supporting multiple voltage levels.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tmio: remove now unused variable
Wolfram Sang [Thu, 30 Aug 2018 12:16:03 +0000 (14:16 +0200)]
mmc: tmio: remove now unused variable

This variable is unused now after some refactoring.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tmio: more concise clk calculation
Wolfram Sang [Thu, 30 Aug 2018 12:14:38 +0000 (14:14 +0200)]
mmc: tmio: more concise clk calculation

Concise, but still readable.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tmio: Fix SCC error detection
Masaharu Hayakawa [Wed, 29 Aug 2018 23:32:07 +0000 (01:32 +0200)]
mmc: tmio: Fix SCC error detection

SDR104, HS200 and HS400 need to check for SCC error. If SCC error is
detected, retuning is necessary.

Signed-off-by: Masaharu Hayakawa <masaharu.hayakawa.ry@renesas.com>
[Niklas: update commit message]
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: renesas_sdhi: skip SCC error check when retuning
Masaharu Hayakawa [Wed, 29 Aug 2018 23:32:06 +0000 (01:32 +0200)]
mmc: renesas_sdhi: skip SCC error check when retuning

Checking for SCC error during retuning is unnecessary.

Signed-off-by: Masaharu Hayakawa <masaharu.hayakawa.ry@renesas.com>
[Niklas: fix small style issue]
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: core: add helper to see if a host is doing a retune
Niklas Söderlund [Wed, 29 Aug 2018 23:32:05 +0000 (01:32 +0200)]
mmc: core: add helper to see if a host is doing a retune

Add a helper to allow host drivers checking if a retune is in progress.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tmio: refactor CLK_CTL bit calculation
Masahiro Yamada [Thu, 23 Aug 2018 04:44:20 +0000 (13:44 +0900)]
mmc: tmio: refactor CLK_CTL bit calculation

for (clk = 0x80000080; new_clock >= (clock << 1); clk >>= 1)
          clock <<= 1;

... is too tricky, hence I replaced with

  roundup_pow_of_two(divisor) >> 2

'(clk >> 22) & 0x1' is the bit test for the 1/1 divisor, but
it is not clear.  'divisor <= 1' is easier to understand.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: renesas_sdhi: merge clk_{start,stop} functions to set_clock
Masahiro Yamada [Thu, 23 Aug 2018 04:44:19 +0000 (13:44 +0900)]
mmc: renesas_sdhi: merge clk_{start,stop} functions to set_clock

renesas_sdhi_clk_start() and renesas_sdhi_clk_stop() are now only
called from renesas_sdhi_set_clock().  Merge them.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci-of-dwcmshc: solve 128MB DMA boundary limitation
Jisheng Zhang [Tue, 28 Aug 2018 09:48:14 +0000 (17:48 +0800)]
mmc: sdhci-of-dwcmshc: solve 128MB DMA boundary limitation

When using DMA, if the DMA addr spans 128MB boundary, we have to split
the DMA transfer into two so that each one doesn't exceed the boundary.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci: introduce adma_write_desc() hook to struct sdhci_ops
Jisheng Zhang [Tue, 28 Aug 2018 09:47:23 +0000 (17:47 +0800)]
mmc: sdhci: introduce adma_write_desc() hook to struct sdhci_ops

Add this hook so that it can be overridden with driver specific
implementations. We also let the original sdhci_adma_write_desc()
accept &desc so that the function can set its new value. Then export
the function so that it could be reused by driver's specific
implementations.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci: add adma_table_cnt member to struct sdhci_host
Jisheng Zhang [Tue, 28 Aug 2018 09:46:35 +0000 (17:46 +0800)]
mmc: sdhci: add adma_table_cnt member to struct sdhci_host

This patch adds adma_table_cnt member to struct sdhci_host to give more
flexibility to drivers to control the ADMA table count.

Default value of adma_table_cnt is set to (SDHCI_MAX_SEGS * 2 + 1).

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: Convert to using %pOFn instead of device_node.name
Rob Herring [Tue, 28 Aug 2018 01:52:33 +0000 (20:52 -0500)]
mmc: Convert to using %pOFn instead of device_node.name

In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hu Ziji <huziji@marvell.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-mmc@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci: Export sdhci_request()
Aapo Vienamo [Mon, 20 Aug 2018 09:23:32 +0000 (12:23 +0300)]
mmc: sdhci: Export sdhci_request()

Allow SDHCI drivers to hook code before and after sdhci_request() by
making it externally visible.

Signed-off-by: Aapo Vienamo <avienamo@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agosdhci: acpi: add qcom sdhci host reset quirk fix
Wang Dongsheng [Thu, 16 Aug 2018 04:48:43 +0000 (12:48 +0800)]
sdhci: acpi: add qcom sdhci host reset quirk fix

After host requests RESET_FOR_ALL action, the hardware output an
interrupt for OS and waiting for the OS to approve.

Before writing this fix, ACPI GED has handled the interrupt. But
the ACPI GED belongs to a slow process, and sometimes the handling
process time is more than 100ms(Mutex wait more than 100ms). So
drop the GED solution and add this quirk fix.

Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agosdhci: acpi: add free_slot callback
Wang Dongsheng [Thu, 16 Aug 2018 04:48:42 +0000 (12:48 +0800)]
sdhci: acpi: add free_slot callback

The device specific resource can be free in free_slot after
removing host controller.

Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci-of-esdhc: add erratum A008171 support
Yinbo Zhu [Thu, 23 Aug 2018 08:48:32 +0000 (16:48 +0800)]
mmc: sdhci-of-esdhc: add erratum A008171 support

In tuning mode of operation, when TBCTL[TB_EN] is set, eSDHC may report
one of the following errors :
1)Tuning error while running tuning operation where SYSCTL2[SAMPCLKSEL]
will not get set even when SYSCTL2[EXTN] is reset. OR
2)Data transaction error (e.g. IRQSTAT[DCE], IRQSTAT[DEBE]) during data
transaction errors.
This issue occurs when the data window sampled within eSDHC is in full
cycle. So, in that case, eSDHC is not able to find out the start and
end points of the data window and sets the sampling pointer at default
location (which is middle of the internal SD clock). If this sampling
point coincides with the data eye boundary, then it can result in the
above mentioned errors. Impact: Tuning mode of operation for SDR50,
SDR104 or HS200 speed modes may not work properly
Workaround: In case eSDHC reports tuning error or data errors in tuning
mode of operation, by add the erratum A008171 support to fix the issue.

Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci: add tuning error codes
Yinbo Zhu [Thu, 23 Aug 2018 08:48:31 +0000 (16:48 +0800)]
mmc: sdhci: add tuning error codes

This patch is to add tuning error codes to
judge tuning state

Signed-off-by: Yinbo Zhu <yinbo.zhu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: uniphier-sd: add UniPhier SD/eMMC controller driver
Masahiro Yamada [Thu, 23 Aug 2018 04:44:18 +0000 (13:44 +0900)]
mmc: uniphier-sd: add UniPhier SD/eMMC controller driver

Here is another TMIO MMC variant found in Socionext UniPhier SoCs.

As commit b6147490e6aa ("mmc: tmio: split core functionality, DMA and
MFD glue") said, these MMC controllers use the IP from Panasonic.

However, the MMC controller in the TMIO (Toshiba Mobile IO) MFD chip
was the first upstreamed user of this IP.  The common driver code
for this IP is now called 'tmio-mmc-core' in Linux although it is a
historical misnomer.

Anyway, this driver select's MMC_TMIO_CORE to borrow the common code
from tmio-mmc-core.c

Older UniPhier SoCs (LD4, Pro4, sLD8) support the external DMA engine
like renesas_sdhi_sys_dmac.c.  The difference is UniPhier SoCs use a
single DMA channel whereas Renesas chips request separate channels for
RX and TX.

Newer UniPhier SoCs (Pro5 and later) support the internal DMA engine
like renesas_sdhi_internal_dmac.c  The register map is almost the same,
so I guess Renesas and Socionext use the same internal DMA hardware.
The main difference is, the register offsets are doubled for Renesas.

                        Renesas      Socionext
                        SDHI         UniPhier
  DM_CM_DTRAN_MODE      0x820        0x410
  DM_CM_DTRAN_CTRL      0x828        0x414
  DM_CM_RST             0x830        0x418
  DM_CM_INFO1           0x840        0x420
  DM_CM_INFO1_MASK      0x848        0x424
  DM_CM_INFO2           0x850        0x428
  DM_CM_INFO2_MASK      0x858        0x42c
  DM_DTRAN_ADDR         0x880        0x440
  DM_DTRAN_ADDREX        ---         0x444

This comes from the difference of host->bus_shift; 2 for Renesas SoCs,
and 1 for UniPhier SoCs.  Also, the datasheet for UniPhier SoCs defines
DM_DTRAN_ADDR and DM_DTRAN_ADDREX as two separate registers.

It could be possible to factor out the DMA common code by introducing
some hooks to cope with platform quirks, but this patch does not touch
that for now.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agodt-bindings: mmc: add DT binding for UniPhier SD/eMMC controller
Masahiro Yamada [Thu, 23 Aug 2018 04:44:17 +0000 (13:44 +0900)]
dt-bindings: mmc: add DT binding for UniPhier SD/eMMC controller

This SD/eMMC controller is used for UniPhier SoC family.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tmio: move tmio_mmc_set_clock() to platform hook
Masahiro Yamada [Thu, 23 Aug 2018 04:44:16 +0000 (13:44 +0900)]
mmc: tmio: move tmio_mmc_set_clock() to platform hook

tmio_mmc_set_clock() is full of quirks because different SoC vendors
extended this in different ways.

The original IP defines the divisor range 1/2 ... 1/512.

 bit 7 is set:    1/512
 bit 6 is set:    1/256
   ...
 bit 0 is set:    1/4
 all bits clear:  1/2

It is platform-dependent how to achieve the 1/1 clock.

I guess the TMIO-MFD variant uses the clock selector outside of this IP,
as far as I see tmio_core_mmc_clk_div() in drivers/mfd/tmio_core.c

I guess bit[7:0]=0xff is Renesas-specific extension.

Socionext (and Panasonic) uses bit 10 (CLKSEL) for 1/1.  Also, newer
versions of UniPhier SoC variants use bit 16 for 1/1024.

host->clk_update() is only used by the Renesas variants, whereas
host->set_clk_div() is only used by the TMIO-MFD variants.

To cope with this mess, promote tmio_mmc_set_clock() to a new
platform hook ->set_clock(), and melt the old two hooks into it.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: tmio: replace tmio_mmc_clk_stop() calls with tmio_mmc_set_clock()
Masahiro Yamada [Thu, 23 Aug 2018 04:44:15 +0000 (13:44 +0900)]
mmc: tmio: replace tmio_mmc_clk_stop() calls with tmio_mmc_set_clock()

tmio_mmc_clk_stop(host) is equivalent to tmio_mmc_set_clock(host, 0).
This replacement is needed for the next commit.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: jz4740: Add support for the JZ4725B
Paul Cercueil [Tue, 21 Aug 2018 15:21:51 +0000 (17:21 +0200)]
mmc: jz4740: Add support for the JZ4725B

The JZ4725B is the first JZ SoC version that introduced a 32-bit IMASK
register, not the JZ4750.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: use SPDX identifier for Renesas drivers
Wolfram Sang [Tue, 21 Aug 2018 22:02:17 +0000 (00:02 +0200)]
mmc: use SPDX identifier for Renesas drivers

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agodt-bindings: mmc: tmio_mmc: document Renesas R8A77970 bindings
Sergei Shtylyov [Tue, 21 Aug 2018 19:14:12 +0000 (22:14 +0300)]
dt-bindings: mmc: tmio_mmc: document Renesas R8A77970 bindings

Document the R-Car V3M (R8A77970) SoC in the R-Car SDHI bindings -- it's
the usual R-Car gen3 compatible controller with the internal DMA engine.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
7 years agommc: renesas_sdhi_internal_dmac: add R8A77970 to whitelist
Sergei Shtylyov [Sat, 18 Aug 2018 18:08:26 +0000 (21:08 +0300)]
mmc: renesas_sdhi_internal_dmac: add R8A77970 to whitelist

I've successfully tested eMMC on the V3H Starter Kit board and since the
R8A77970 SoC has a single SDHI core, it can't be a subject to the known RX
DMA errata.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
7 years agommc: renesas_sdhi_internal_dmac: Fix a few typos
Sergei Shtylyov [Fri, 17 Aug 2018 20:19:02 +0000 (23:19 +0300)]
mmc: renesas_sdhi_internal_dmac: Fix a few typos

Remove the stray underscore in the DM_CM_DTRAN_MODE.BUS_WIDTH register
field name and fix the typo in the comment of the #define
DTRAN_MODE_CH_NUM_CH1.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: jz4740: Drop dependency on MACH_JZ4740/80
Paul Cercueil [Tue, 21 Aug 2018 13:03:20 +0000 (15:03 +0200)]
mmc: jz4740: Drop dependency on MACH_JZ4740/80

Depending on MACH_JZ4740 | MACH_JZ4780 prevent us from creating a generic
kernel that works on more than one MIPS board. Instead, we just depend on
MIPS being set.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: dw_mmc: hi3798cv200: add MMC_CAP_CMD23 cap
Igor Opaniuk [Mon, 20 Aug 2018 13:04:57 +0000 (16:04 +0300)]
mmc: dw_mmc: hi3798cv200: add MMC_CAP_CMD23 cap

Enable access to the RPMB on the on-board eMMC of the
Poplar board.

Signed-off-by: Igor Opaniuk <igor.opaniuk@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: renesas_sdhi: Add r8a774a1 support
Fabrizio Castro [Tue, 14 Aug 2018 12:34:34 +0000 (13:34 +0100)]
mmc: renesas_sdhi: Add r8a774a1 support

Document RZ/G2M (R8A774A1) SoC bindings.

Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Biju Das <biju.das@bp.renesas.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: renesas_sdhi_internal_dmac: Whitelist r8a774a1
Fabrizio Castro [Tue, 14 Aug 2018 12:34:33 +0000 (13:34 +0100)]
mmc: renesas_sdhi_internal_dmac: Whitelist r8a774a1

We need r8a774a1 to be whitelisted for SDHI to work on the RZ/G2M,
but we don't care about the revision of the SoC, so just whitelist
the generic part number.

Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Biju Das <biju.das@bp.renesas.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci-of-arasan: Do now show error message in case of deffered probe
Michal Simek [Mon, 6 Aug 2018 08:43:10 +0000 (10:43 +0200)]
mmc: sdhci-of-arasan: Do now show error message in case of deffered probe

When mmc-pwrseq property is passed mmc_pwrseq_alloc() can return
-EPROBE_DEFER because driver for power sequence provider is not probed
yet. Do not show error message when this situation happens.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci-iproc: Add ACPI support
Srinath Mannam [Sun, 5 Aug 2018 07:52:52 +0000 (13:22 +0530)]
mmc: sdhci-iproc: Add ACPI support

Add ACPI support to all IPROC SDHCI variants.

Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agommc: sdhci-pltfm: Convert DT properties to generic device properties
Adrian Hunter [Sun, 5 Aug 2018 07:52:51 +0000 (13:22 +0530)]
mmc: sdhci-pltfm: Convert DT properties to generic device properties

Convert DT properties to generic device properties
so that drivers can get properties from DT or ACPI.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Srinath Mannam <srinath.mannam@broadcom.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
7 years agoLinux 4.19-rc7 v4.19-rc7
Greg Kroah-Hartman [Sun, 7 Oct 2018 15:26:02 +0000 (17:26 +0200)]
Linux 4.19-rc7

7 years agoMerge tag 'char-misc-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Greg Kroah-Hartman [Sun, 7 Oct 2018 06:15:57 +0000 (08:15 +0200)]
Merge tag 'char-misc-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

I wrote:
  "Char/Misc fixes for 4.19-rc7

   Here are 8 small fixes for some char/misc driver issues

   Included here are:
- fpga driver fixes
- thunderbolt bugfixes
- firmware core revert/fix
- hv core fix
- hv tool fix

   All of these have been in linux-next with no reported issues."

* tag 'char-misc-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  thunderbolt: Initialize after IOMMUs
  thunderbolt: Do not handle ICM events after domain is stopped
  firmware: Always initialize the fw_priv list object
  docs: fpga: document fpga manager flags
  fpga: bridge: fix obvious function documentation error
  tools: hv: fcopy: set 'error' in case an unknown operation was requested
  fpga: do not access region struct after fpga_region_unregister
  Drivers: hv: vmbus: Use get/put_cpu() in vmbus_connect()

7 years agoMerge tag 'tty-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Greg Kroah-Hartman [Sun, 7 Oct 2018 06:14:59 +0000 (08:14 +0200)]
Merge tag 'tty-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

I wrote:
  "Serial driver fixes for 4.19-rc7

   Here are 3 small serial driver fixes for 4.19-rc7
    - 2 sh-sci bugfixes for reported issues
    - a revert of the PM handling for the 8250_dw code

   All of these have been in linux-next with no reported issues."

* tag 'tty-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: sh-sci: Allow for compressed SCIF address"
  Revert "serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE"
  Revert "serial: 8250_dw: Fix runtime PM handling"

7 years agoMerge tag 'usb-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Greg Kroah-Hartman [Sun, 7 Oct 2018 06:14:06 +0000 (08:14 +0200)]
Merge tag 'usb-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

I wrote:
  "USB fixes for 4.19-rc7

   Here are some small USB fixes for 4.19-rc7

   These include:
     - the usual xhci bugfixes for reported issues
     - some new serial driver device ids
     - bugfix for the option serial driver for some devices
     - bugfix for the cdc_acm driver that has been there for a long time.

   All of these have been in linux-next for a while with no reported
   issues."

* tag 'usb-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: xhci-mtk: resume USB3 roothub first
  xhci: Add missing CAS workaround for Intel Sunrise Point xHCI
  usb: cdc_acm: Do not leak URB buffers
  USB: serial: simple: add Motorola Tetra MTP6550 id
  USB: serial: option: add two-endpoints device-id flag
  USB: serial: option: improve Quectel EP06 detection

7 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Greg Kroah-Hartman [Sun, 7 Oct 2018 05:07:33 +0000 (07:07 +0200)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Wolfram writes:
  "i2c for 4.19

   I2C has three driver bugfixes and a fix for a typo for you."

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: designware: Call i2c_dw_clk_rate() only when calculating timings
  i2c: i2c-scmi: fix for i2c_smbus_write_block_data
  i2c: i2c-isch: fix spelling mistake "unitialized" -> "uninitialized"
  i2c: i2c-qcom-geni: Properly handle DMA safe buffers

7 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Greg Kroah-Hartman [Sun, 7 Oct 2018 05:06:52 +0000 (07:06 +0200)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

James writes:
  "SCSI fixes on 20181006

   Small fix for an unititialized mutex in the qedi driver."

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qedi: Initialize the stats mutex lock

7 years agoMerge tag 'powerpc-4.19-4' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Greg Kroah-Hartman [Sun, 7 Oct 2018 05:05:43 +0000 (07:05 +0200)]
Merge tag 'powerpc-4.19-4' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Michael writes:
  "powerpc fixes for 4.19 #4

   Four regression fixes.

   A fix for a change to lib/xz which broke our zImage loader when
   building with XZ compression. OK'ed by Herbert who merged the
   original patch.

   The recent fix we did to avoid patching __init text broke some 32-bit
   machines, fix that.

   Our show_user_instructions() could be tricked into printing kernel
   memory, add a check to avoid that.

   And a fix for a change to our NUMA initialisation logic, which causes
   crashes in some kdump configurations.

   Thanks to:
     Christophe Leroy, Hari Bathini, Jann Horn, Joel Stanley, Meelis
     Roos, Murilo Opsfelder Araujo, Srikar Dronamraju."

* tag 'powerpc-4.19-4' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/numa: Skip onlining a offline node in kdump path
  powerpc: Don't print kernel instructions in show_user_instructions()
  powerpc/lib: fix book3s/32 boot failure due to code patching
  lib/xz: Put CRC32_POLY_LE in xz_private.h

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Greg Kroah-Hartman [Sat, 6 Oct 2018 09:11:30 +0000 (02:11 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Dave writes:
  "Networking fixes:

  1) Fix truncation of 32-bit right shift in bpf, from Jann Horn.

  2) Fix memory leak in wireless wext compat, from Stefan Seyfried.

  3) Use after free in cfg80211's reg_process_hint(), from Yu Zhao.

  4) Need to cancel pending work when unbinding in smsc75xx otherwise
     we oops, also from Yu Zhao.

  5) Don't allow enslaving a team device to itself, from Ido Schimmel.

  6) Fix backwards compat with older userspace for rtnetlink FDB dumps.
     From Mauricio Faria.

  7) Add validation of tc policy netlink attributes, from David Ahern.

  8) Fix RCU locking in rawv6_send_hdrinc(), from Wei Wang."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
  ipv6: take rcu lock in rawv6_send_hdrinc()
  net: sched: Add policy validation for tc attributes
  rtnetlink: fix rtnl_fdb_dump() for ndmsg header
  yam: fix a missing-check bug
  net: bpfilter: Fix type cast and pointer warnings
  net: cxgb3_main: fix a missing-check bug
  bpf: 32-bit RSH verification must truncate input before the ALU op
  net: phy: phylink: fix SFP interface autodetection
  be2net: don't flip hw_features when VXLANs are added/deleted
  net/packet: fix packet drop as of virtio gso
  net: dsa: b53: Keep CPU port as tagged in all VLANs
  openvswitch: load NAT helper
  bnxt_en: get the reduced max_irqs by the ones used by RDMA
  bnxt_en: free hwrm resources, if driver probe fails.
  bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request
  bnxt_en: Fix VNIC reservations on the PF.
  team: Forbid enslaving team device to itself
  net/usb: cancel pending work when unbinding smsc75xx
  mlxsw: spectrum: Delete RIF when VLAN device is removed
  ...

7 years agoMerge branch 'akpm'
Greg Kroah-Hartman [Fri, 5 Oct 2018 23:33:03 +0000 (16:33 -0700)]
Merge branch 'akpm'

* akpm:
  mm: madvise(MADV_DODUMP): allow hugetlbfs pages
  ocfs2: fix locking for res->tracking and dlm->tracking_list
  mm/vmscan.c: fix int overflow in callers of do_shrink_slab()
  mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
  mm/vmstat.c: fix outdated vmstat_text
  proc: restrict kernel stack dumps to root
  mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes
  mm/migrate.c: split only transparent huge pages when allocation fails
  ipc/shm.c: use ERR_CAST() for shm_lock() error return
  mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl
  mm, thp: fix mlocking THP page with migration enabled
  ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
  hugetlb: take PMD sharing into account when flushing tlb/caches
  mm: migration: fix migration of huge PMD shared pages

7 years agomm: madvise(MADV_DODUMP): allow hugetlbfs pages
Daniel Black [Fri, 5 Oct 2018 22:52:19 +0000 (15:52 -0700)]
mm: madvise(MADV_DODUMP): allow hugetlbfs pages

Reproducer, assuming 2M of hugetlbfs available:

Hugetlbfs mounted, size=2M and option user=testuser

  # mount | grep ^hugetlbfs
  hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan)
  # sysctl vm.nr_hugepages=1
  vm.nr_hugepages = 1
  # grep Huge /proc/meminfo
  AnonHugePages:         0 kB
  ShmemHugePages:        0 kB
  HugePages_Total:       1
  HugePages_Free:        1
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB
  Hugetlb:            2048 kB

Code:

  #include <sys/mman.h>
  #include <stddef.h>
  #define SIZE 2*1024*1024
  int main()
  {
    void *ptr;
    ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0);
    madvise(ptr, SIZE, MADV_DONTDUMP);
    madvise(ptr, SIZE, MADV_DODUMP);
  }

Compile and strace:

  mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000
  madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0
  madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument)

hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
author testing with analysis from Florian Weimer[1].

The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a
consequence of the large useage of VM_DONTEXPAND in device drivers.

A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
marked DODUMP.

A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
memory for a while and later request that madvise(MADV_DODUMP) on the same
memory.  We correct this omission by allowing madvice(MADV_DODUMP) on
hugetlbfs pages.

[1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
[2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")

Link: http://lkml.kernel.org/r/20180930054629.29150-1-daniel@linux.ibm.com
Link: https://lists.launchpad.net/maria-discuss/msg05245.html
Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
Reported-by: Kenneth Penza <kpenza@gmail.com>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoocfs2: fix locking for res->tracking and dlm->tracking_list
Ashish Samant [Fri, 5 Oct 2018 22:52:15 +0000 (15:52 -0700)]
ocfs2: fix locking for res->tracking and dlm->tracking_list

In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock.  This can cause list
corruptions and can end up in kernel panic.

Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.

Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/vmscan.c: fix int overflow in callers of do_shrink_slab()
Kirill Tkhai [Fri, 5 Oct 2018 22:52:10 +0000 (15:52 -0700)]
mm/vmscan.c: fix int overflow in callers of do_shrink_slab()

do_shrink_slab() returns unsigned long value, and the placing into int
variable cuts high bytes off.  Then we compare ret and 0xfffffffe (since
SHRINK_EMPTY is converted to ret type).

Thus a large number of objects returned by do_shrink_slab() may be
interpreted as SHRINK_EMPTY, if low bytes of their value are equal to
0xfffffffe.  Fix that by declaration ret as unsigned long in these
functions.

Link: http://lkml.kernel.org/r/153813407177.17544.14888305435570723973.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
Jann Horn [Fri, 5 Oct 2018 22:52:07 +0000 (15:52 -0700)]
mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly

5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even
on UP") made the availability of the NR_TLB_REMOTE_FLUSH* counters inside
the kernel unconditional to reduce #ifdef soup, but (either to avoid
showing dummy zero counters to userspace, or because that code was missed)
didn't update the vmstat_array, meaning that all following counters would
be shown with incorrect values.

This only affects kernel builds with
CONFIG_VM_EVENT_COUNTERS=y && CONFIG_DEBUG_TLBFLUSH=y && CONFIG_SMP=n.

Link: http://lkml.kernel.org/r/20181001143138.95119-2-jannh@google.com
Fixes: 5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/vmstat.c: fix outdated vmstat_text
Jann Horn [Fri, 5 Oct 2018 22:52:03 +0000 (15:52 -0700)]
mm/vmstat.c: fix outdated vmstat_text

7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") removed the
VMACACHE_FULL_FLUSHES statistics, but didn't remove the corresponding
entry in vmstat_text.  This causes an out-of-bounds access in
vmstat_show().

Luckily this only affects kernels with CONFIG_DEBUG_VM_VMACACHE=y, which
is probably very rare.

Link: http://lkml.kernel.org/r/20181001143138.95119-1-jannh@google.com
Fixes: 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Kemi Wang <kemi.wang@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoproc: restrict kernel stack dumps to root
Jann Horn [Fri, 5 Oct 2018 22:51:58 +0000 (15:51 -0700)]
proc: restrict kernel stack dumps to root

Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU.  That means that
the stack can change under the stack walker.  The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers.  This can cause exposure of
kernel stack contents to userspace.

Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents.  See the added comment for a longer
rationale.

There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails.  Therefore, I believe
that this change is unlikely to break things.  In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.

Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f50 ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes
Anshuman Khandual [Fri, 5 Oct 2018 22:51:54 +0000 (15:51 -0700)]
mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes

ARM64 architecture also supports 32MB and 512MB HugeTLB page sizes.  This
just adds mmap() system call argument encoding for them.

Link: http://lkml.kernel.org/r/1537841300-6979-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/migrate.c: split only transparent huge pages when allocation fails
Anshuman Khandual [Fri, 5 Oct 2018 22:51:51 +0000 (15:51 -0700)]
mm/migrate.c: split only transparent huge pages when allocation fails

split_huge_page_to_list() fails on HugeTLB pages.  I was experimenting
with moving 32MB contig HugeTLB pages on arm64 (with a debug patch
applied) and hit the following stack trace when the kernel crashed.

[ 3732.462797] Call trace:
[ 3732.462835]  split_huge_page_to_list+0x3b0/0x858
[ 3732.462913]  migrate_pages+0x728/0xc20
[ 3732.462999]  soft_offline_page+0x448/0x8b0
[ 3732.463097]  __arm64_sys_madvise+0x724/0x850
[ 3732.463197]  el0_svc_handler+0x74/0x110
[ 3732.463297]  el0_svc+0x8/0xc
[ 3732.463347] Code: d1000400 f90b0e60 f2fbd5a2 a94982a1 (f9000420)

When unmap_and_move[_huge_page]() fails due to lack of memory, the
splitting should happen only for transparent huge pages not for HugeTLB
pages.  PageTransHuge() returns true for both THP and HugeTLB pages.
Hence the conditonal check should test PagesHuge() flag to make sure that
given pages is not a HugeTLB one.

Link: http://lkml.kernel.org/r/1537798495-4996-1-git-send-email-anshuman.khandual@arm.com
Fixes: 94723aafb9 ("mm: unclutter THP migration")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoipc/shm.c: use ERR_CAST() for shm_lock() error return
Kees Cook [Fri, 5 Oct 2018 22:51:48 +0000 (15:51 -0700)]
ipc/shm.c: use ERR_CAST() for shm_lock() error return

This uses ERR_CAST() instead of an open-coded cast, as it is casting
across structure pointers, which upsets __randomize_layout:

ipc/shm.c: In function `shm_lock':
ipc/shm.c:209:9: note: randstruct: casting between randomized structure pointer types (ssa): `struct shmid_kernel' and `struct kern_ipc_perm'

  return (void *)ipcp;
         ^~~~~~~~~~~~

Link: http://lkml.kernel.org/r/20180919180722.GA15073@beast
Fixes: 82061c57ce93 ("ipc: drop ipc_lock()")
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl
YueHaibing [Fri, 5 Oct 2018 22:51:44 +0000 (15:51 -0700)]
mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl

get_user_pages_fast() will return negative value if no pages were pinned,
then be converted to a unsigned, which is compared to zero, giving the
wrong result.

Link: http://lkml.kernel.org/r/20180921095015.26088-1-yuehaibing@huawei.com
Fixes: 09e35a4a1ca8 ("mm/gup_benchmark: handle gup failures")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm, thp: fix mlocking THP page with migration enabled
Kirill A. Shutemov [Fri, 5 Oct 2018 22:51:41 +0000 (15:51 -0700)]
mm, thp: fix mlocking THP page with migration enabled

A transparent huge page is represented by a single entry on an LRU list.
Therefore, we can only make unevictable an entire compound page, not
individual subpages.

If a user tries to mlock() part of a huge page, we want the rest of the
page to be reclaimable.

We handle this by keeping PTE-mapped huge pages on normal LRU lists: the
PMD on border of VM_LOCKED VMA will be split into PTE table.

Introduction of THP migration breaks[1] the rules around mlocking THP
pages.  If we had a single PMD mapping of the page in mlocked VMA, the
page will get mlocked, regardless of PTE mappings of the page.

For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in
remove_migration_pmd().

Anon THP pages can only be shared between processes via fork().  Mlocked
page can only be shared if parent mlocked it before forking, otherwise CoW
will be triggered on mlock().

For Anon-THP, we can fix the issue by munlocking the page on removing PTE
migration entry for the page.  PTEs for the page will always come after
mlocked PMD: rmap walks VMAs from oldest to newest.

Test-case:

#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <linux/mempolicy.h>
#include <numaif.h>

int main(void)
{
        unsigned long nodemask = 4;
        void *addr;

addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0);

        if (fork()) {
wait(NULL);
return 0;
        }

        mlock(addr, 4UL << 10);
        mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES,
                &nodemask, 4, MPOL_MF_MOVE);

        return 0;
}

[1] https://lkml.kernel.org/r/CAOMGZ=G52R-30rZvhGxEbkTw7rLLwBGadVYeo--iizcD3upL3A@mail.gmail.com

Link: http://lkml.kernel.org/r/20180917133816.43995-1-kirill.shutemov@linux.intel.com
Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
Larry Chen [Fri, 5 Oct 2018 22:51:37 +0000 (15:51 -0700)]
ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()

ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages
is dirty.  When a page has not been written back, it is still in dirty
state.  If ocfs2_duplicate_clusters_by_page() is called against the dirty
page, the crash happens.

To fix this bug, we can just unlock the page and wait until the page until
its not dirty.

The following is the backtrace:

kernel BUG at /root/code/ocfs2/refcounttree.c:2961!
[exception RIP: ocfs2_duplicate_clusters_by_page+822]
__ocfs2_move_extent+0x80/0x450 [ocfs2]
? __ocfs2_claim_clusters+0x130/0x250 [ocfs2]
ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2]
__ocfs2_move_extents_range+0x2a4/0x470 [ocfs2]
ocfs2_move_extents+0x180/0x3b0 [ocfs2]
? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2]
ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2]
ocfs2_ioctl+0x253/0x640 [ocfs2]
do_vfs_ioctl+0x90/0x5f0
SyS_ioctl+0x74/0x80
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Once we find the page is dirty, we do not wait until it's clean, rather we
use write_one_page() to write it back

Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com
[lchen@suse.com: update comments]
Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Larry Chen <lchen@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agohugetlb: take PMD sharing into account when flushing tlb/caches
Mike Kravetz [Fri, 5 Oct 2018 22:51:33 +0000 (15:51 -0700)]
hugetlb: take PMD sharing into account when flushing tlb/caches

When fixing an issue with PMD sharing and migration, it was discovered via
code inspection that other callers of huge_pmd_unshare potentially have an
issue with cache and tlb flushing.

Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst
case ranges for mmu notifiers.  Ensure that this range is flushed if
huge_pmd_unshare succeeds and unmaps a PUD_SUZE area.

Link: http://lkml.kernel.org/r/20180823205917.16297-3-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm: migration: fix migration of huge PMD shared pages
Mike Kravetz [Fri, 5 Oct 2018 22:51:29 +0000 (15:51 -0700)]
mm: migration: fix migration of huge PMD shared pages

The page migration code employs try_to_unmap() to try and unmap the source
page.  This is accomplished by using rmap_walk to find all vmas where the
page is mapped.  This search stops when page mapcount is zero.  For shared
PMD huge pages, the page map count is always 1 no matter the number of
mappings.  Shared mappings are tracked via the reference count of the PMD
page.  Therefore, try_to_unmap stops prematurely and does not completely
unmap all mappings of the source page.

This problem can result is data corruption as writes to the original
source page can happen after contents of the page are copied to the target
page.  Hence, data is lost.

This problem was originally seen as DB corruption of shared global areas
after a huge page was soft offlined due to ECC memory errors.  DB
developers noticed they could reproduce the issue by (hotplug) offlining
memory used to back huge pages.  A simple testcase can reproduce the
problem by creating a shared PMD mapping (note that this must be at least
PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using
migrate_pages() to migrate process pages between nodes while continually
writing to the huge pages being migrated.

To fix, have the try_to_unmap_one routine check for huge PMD sharing by
calling huge_pmd_unshare for hugetlbfs huge pages.  If it is a shared
mapping it will be 'unshared' which removes the page table entry and drops
the reference on the PMD page.  After this, flush caches and TLB.

mmu notifiers are called before locking page tables, but we can not be
sure of PMD sharing until page tables are locked.  Therefore, check for
the possibility of PMD sharing before locking so that notifiers can
prepare for the worst possible case.

Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com
[mike.kravetz@oracle.com: make _range_in_vma() a static inline]
Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoMerge tag 'pci-v4.19-fixes-3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git...
Greg Kroah-Hartman [Fri, 5 Oct 2018 23:11:16 +0000 (16:11 -0700)]
Merge tag 'pci-v4.19-fixes-3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Bjorn writes:
  "PCI fixes for v4.19:

   - Reprogram bridge prefetch registers to fix NVIDIA and Radeon issues
     after suspend/resume (Daniel Drake)

   - Fix mvebu I/O mapping creation sequence (Thomas Petazzoni)

   - Fix minor MAINTAINERS file match issue (Bjorn Helgaas)"

* tag 'pci-v4.19-fixes-3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: mvebu: Fix PCI I/O mapping creation sequence
  MAINTAINERS: Remove obsolete drivers/pci pattern from ACPI section
  PCI: Reprogram bridge prefetch registers on resume

7 years agoMerge tag 'for-4.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Greg Kroah-Hartman [Fri, 5 Oct 2018 23:09:56 +0000 (16:09 -0700)]
Merge tag 'for-4.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Mike writes:
  "device mapper fixes

   - Fix a DM thinp __udivdi3 undefined on 32-bit bug introduced during
     4.19 merge window.

   - Fix leak and dangling pointer in DM multipath's scsi_dh related code.

   - A couple stable@ fixes for DM cache's resize support.

   - A DM raid fix to remove "const" from decipher_sync_action()'s return
     type."

* tag 'for-4.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix resize crash if user doesn't reload cache table
  dm cache metadata: ignore hints array being too small during resize
  dm raid: remove bogus const from decipher_sync_action() return type
  dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer
  dm thin metadata: fix __udivdi3 undefined on 32-bit

7 years agoMerge tag 'gpio-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Greg Kroah-Hartman [Fri, 5 Oct 2018 23:09:11 +0000 (16:09 -0700)]
Merge tag 'gpio-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Linus writes:
  "A single GPIO fix:
   Free the last used descriptor, an off by one error.
   This is tagged for stable as well."

* tag 'gpio-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpiolib: Free the last requested descriptor

7 years agoMerge tag 'pm-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Greg Kroah-Hartman [Fri, 5 Oct 2018 23:08:12 +0000 (16:08 -0700)]
Merge tag 'pm-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Rafael writes:
  "Power management fix for 4.19-rc7

   Fix a bug that may cause runtime PM to misbehave for some devices
   after a failing or aborted system suspend which is nasty enough for
   an -rc7 time frame fix."

* tag 'pm-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / core: Clear the direct_complete flag on errors

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Fri, 5 Oct 2018 23:07:13 +0000 (16:07 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Ingo writes:
  "perf fixes:
    - fix a CPU#0 hot unplug bug and a PCI enumeration bug in the x86 Intel uncore PMU driver
    - fix a CPU event enumeration bug in the x86 AMD PMU driver
    - fix a perf ring-buffer corruption bug when using tracepoints
    - fix a PMU unregister locking bug"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
  perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
  perf/ring_buffer: Prevent concurent ring buffer access
  perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0
  perf/core: Fix perf_pmu_unregister() locking

7 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Fri, 5 Oct 2018 22:40:57 +0000 (15:40 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Ingo writes:
  "x86 fixes:

   Misc fixes:

    - fix various vDSO bugs: asm constraints and retpolines
    - add vDSO test units to make sure they never re-appear
    - fix UV platform TSC initialization bug
    - fix build warning on Clang"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix vDSO syscall fallback asm constraint regression
  x86/cpu/amd: Remove unnecessary parentheses
  x86/vdso: Only enable vDSO retpolines when enabled and supported
  x86/tsc: Fix UV TSC initialization
  x86/platform/uv: Provide is_early_uv_system()
  selftests/x86: Add clock_gettime() tests to test_vdso
  x86/vdso: Fix asm constraints on vDSO syscall fallbacks

7 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Fri, 5 Oct 2018 22:39:38 +0000 (15:39 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Ingo writes:
  "scheduler fixes:

   These fixes address a rather involved performance regression between
   v4.17->v4.19 in the sched/numa auto-balancing code. Since distros
   really need this fix we accelerated it to sched/urgent for a faster
   upstream merge.

   NUMA scheduling and balancing performance is now largely back to
   v4.17 levels, without reintroducing the NUMA placement bugs that
   v4.18 and v4.19 fixed.

   Many thanks to Srikar Dronamraju, Mel Gorman and Jirka Hladky, for
   reporting, testing, re-testing and solving this rather complex set of
   bugs."

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task
  mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration
  sched/numa: Avoid task migration for small NUMA improvement
  mm/migrate: Use spin_trylock() while resetting rate limit
  sched/numa: Limit the conditions where scan period is reset
  sched/numa: Reset scan rate whenever task moves across nodes
  sched/numa: Pass destination CPU as a parameter to migrate_task_rq
  sched/numa: Stop multiple tasks from moving to the CPU at the same time

7 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Fri, 5 Oct 2018 22:38:32 +0000 (15:38 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Ingo writes:
  "locking fixes:

   A fix in the ww_mutex self-test that produces a scary splat, plus an
   updates to the maintained-filed patters in MAINTAINER."

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/ww_mutex: Fix runtime warning in the WW mutex selftest
  MAINTAINERS: Remove dead path from LOCKING PRIMITIVES entry

7 years agoMerge tag 'sound-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Greg Kroah-Hartman [Fri, 5 Oct 2018 22:37:22 +0000 (15:37 -0700)]
Merge tag 'sound-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Takashi writes:
  "sound fixes for 4.19-rc7

   Just two small fixes for HD-audio: one is for a typo in completion
   timeout, and another a fixup for Dell machines as usual"

* tag 'sound-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Cannot adjust speaker's volume on Dell XPS 27 7760
  ALSA: hda: Fix the audio-component completion timeout

7 years agonet: mvpp2: Extract the correct ethtype from the skb for tx csum offload
Maxime Chevallier [Fri, 5 Oct 2018 07:04:40 +0000 (09:04 +0200)]
net: mvpp2: Extract the correct ethtype from the skb for tx csum offload

When offloading the L3 and L4 csum computation on TX, we need to extract
the l3_proto from the ethtype, independently of the presence of a vlan
tag.

The actual driver uses skb->protocol as-is, resulting in packets with
the wrong L4 checksum being sent when there's a vlan tag in the packet
header and checksum offloading is enabled.

This commit makes use of vlan_protocol_get() to get the correct ethtype
regardless the presence of a vlan tag.

Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: take rcu lock in rawv6_send_hdrinc()
Wei Wang [Thu, 4 Oct 2018 17:12:37 +0000 (10:12 -0700)]
ipv6: take rcu lock in rawv6_send_hdrinc()

In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
directly assign the dst to skb and set passed in dst to NULL to avoid
double free.
However, in error case, we free skb and then do stats update with the
dst pointer passed in. This causes use-after-free on the dst.
Fix it by taking rcu read lock right before dst could get released to
make sure dst does not get freed until the stats update is done.
Note: we don't have this issue in ipv4 cause dst is not used for stats
update in v4.

Syzkaller reported following crash:
BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088

CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
 rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
 __sys_sendmsg+0x11d/0x280 net/socket.c:2152
 __do_sys_sendmsg net/socket.c:2161 [inline]
 __se_sys_sendmsg net/socket.c:2159 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457099
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000

Allocated by task 32088:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
 dst_alloc+0xbb/0x1d0 net/core/dst.c:105
 ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
 ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
 ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
 fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
 ip6_route_output include/net/ip6_route.h:88 [inline]
 ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
 rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
 __sys_sendmsg+0x11d/0x280 net/socket.c:2152
 __do_sys_sendmsg net/socket.c:2161 [inline]
 __se_sys_sendmsg net/socket.c:2159 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 5356:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x83/0x290 mm/slab.c:3756
 dst_destroy+0x267/0x3c0 net/core/dst.c:141
 dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
 __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
 rcu_do_batch kernel/rcu/tree.c:2576 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
 rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
 __do_softirq+0x30b/0xad8 kernel/softirq.c:292

Fixes: 1789a640f556 ("raw: avoid two atomics in xmit")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: Add policy validation for tc attributes
David Ahern [Wed, 3 Oct 2018 22:05:36 +0000 (15:05 -0700)]
net: sched: Add policy validation for tc attributes

A number of TC attributes are processed without proper validation
(e.g., length checks). Add a tca policy for all input attributes and use
when invoking nlmsg_parse.

The 2 Fixes tags below cover the latest additions. The other attributes
are a string (KIND), nested attribute (OPTIONS which does seem to have
validation in most cases), for dumps only or a flag.

Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters")
Fixes: d47a6b0e7c492 ("net: sched: introduce ingress/egress block index attributes for qdisc")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: fix rtnl_fdb_dump() for ndmsg header
Mauricio Faria de Oliveira [Tue, 2 Oct 2018 01:46:40 +0000 (22:46 -0300)]
rtnetlink: fix rtnl_fdb_dump() for ndmsg header

Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').

The problem is, the function bails out early if nlmsg_parse() fails, which
does occur for iproute2 usage of 'struct ndmsg' because the payload length
is shorter than the family header alone (as 'struct ifinfomsg' is assumed).

This breaks backward compatibility with userspace -- nothing is sent back.

Some examples with iproute2 and netlink library for go [1]:

 1) $ bridge fdb show
    33:33:00:00:00:01 dev ens3 self permanent
    01:00:5e:00:00:01 dev ens3 self permanent
    33:33:ff:15:98:30 dev ens3 self permanent

      This one works, as it uses 'struct ifinfomsg'.

      fdb_show() @ iproute2/bridge/fdb.c
        """
        .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
        ...
        if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
        """

 2) $ ip --family bridge neigh
    RTNETLINK answers: Invalid argument
    Dump terminated

      This one fails, as it uses 'struct ndmsg'.

      do_show_or_flush() @ iproute2/ip/ipneigh.c
        """
        .n.nlmsg_type = RTM_GETNEIGH,
        .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
        """

 3) $ ./neighlist
    < no output >

      This one fails, as it uses 'struct ndmsg'-based.

      neighList() @ netlink/neigh_linux.go
        """
        req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
        msg := Ndmsg{
        """

The actual breakage was introduced by commit 0ff50e83b512 ("net: rtnetlink:
bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
if the payload length (with the _actual_ family header) is less than the
family header length alone (which is assumed, in parameter 'hdrlen').
This is true in the examples above with struct ndmsg, with size and payload
length shorter than struct ifinfomsg.

However, that commit just intends to fix something under the assumption the
family header is indeed an 'struct ifinfomsg' - by preventing access to the
payload as such (via 'ifm' pointer) if the payload length is not sufficient
to actually contain it.

The assumption was introduced by commit 5e6d24358799 ("bridge: netlink dump
interface at par with brctl"), to support iproute2's 'bridge fdb' command
(not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.

So, in order to unbreak the 'struct ndmsg' family headers and still allow
'struct ifinfomsg' to continue to work, check for the known message sizes
used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
not used in this function anyway) then do not parse the data as ifinfomsg.

Same examples with this patch applied (or revert/before the original fix):

    $ bridge fdb show
    33:33:00:00:00:01 dev ens3 self permanent
    01:00:5e:00:00:01 dev ens3 self permanent
    33:33:ff:15:98:30 dev ens3 self permanent

    $ ip --family bridge neigh
    dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
    dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
    dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT

    $ ./neighlist
    netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
    netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
    netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}

Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068).

References:

[1] netlink library for go (test-case)
    https://github.com/vishvananda/netlink

    $ cat ~/go/src/neighlist/main.go
    package main
    import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
    func main() {
        neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
        for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
    }

    $ export GOPATH=~/go
    $ go get github.com/vishvananda/netlink
    $ go build neighlist
    $ ~/go/src/neighlist/neighlist

Thanks to David Ahern for suggestions to improve this patch.

Fixes: 0ff50e83b512 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
Reported-by: Aidan Obley <aobley@pivotal.io>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoyam: fix a missing-check bug
Wenwen Wang [Fri, 5 Oct 2018 15:59:36 +0000 (10:59 -0500)]
yam: fix a missing-check bug

In yam_ioctl(), the concrete ioctl command is firstly copied from the
user-space buffer 'ifr->ifr_data' to 'ioctl_cmd' and checked through the
following switch statement. If the command is not as expected, an error
code EINVAL is returned. In the following execution the buffer
'ifr->ifr_data' is copied again in the cases of the switch statement to
specific data structures according to what kind of ioctl command is
requested. However, after the second copy, no re-check is enforced on the
newly-copied command. Given that the buffer 'ifr->ifr_data' is in the user
space, a malicious user can race to change the command between the two
copies. This way, the attacker can inject inconsistent data and cause
undefined behavior.

This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL will be returned.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: bpfilter: Fix type cast and pointer warnings
Shanthosh RK [Fri, 5 Oct 2018 15:27:48 +0000 (20:57 +0530)]
net: bpfilter: Fix type cast and pointer warnings

Fixes the following Sparse warnings:

net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
of expression
net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
NULL pointer

Signed-off-by: Shanthosh RK <shanthosh.rk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: cxgb3_main: fix a missing-check bug
Wenwen Wang [Fri, 5 Oct 2018 13:48:27 +0000 (08:48 -0500)]
net: cxgb3_main: fix a missing-check bug

In cxgb_extension_ioctl(), the command of the ioctl is firstly copied from
the user-space buffer 'useraddr' to 'cmd' and checked through the
switch statement. If the command is not as expected, an error code
EOPNOTSUPP is returned. In the following execution, i.e., the cases of the
switch statement, the whole buffer of 'useraddr' is copied again to a
specific data structure, according to what kind of command is requested.
However, after the second copy, there is no re-check on the newly-copied
command. Given that the buffer 'useraddr' is in the user space, a malicious
user can race to change the command between the two copies. By doing so,
the attacker can supply malicious data to the kernel and cause undefined
behavior.

This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL is returned.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Fri, 5 Oct 2018 17:53:13 +0000 (10:53 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-10-05

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix to truncate input on ALU operations in 32 bit mode, from Jann.

2) Fixes for cgroup local storage to reject reserved flags on element
   update and rejection of map allocation with zero-sized value, from Roman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: 32-bit RSH verification must truncate input before the ALU op
Jann Horn [Fri, 5 Oct 2018 16:17:59 +0000 (18:17 +0200)]
bpf: 32-bit RSH verification must truncate input before the ALU op

When I wrote commit 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification"), I
assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
is sufficient to just truncate the output to 32 bits; and so I just moved
the register size coercion that used to be at the start of the function to
the end of the function.

That assumption is true for almost every op, but not for 32-bit right
shifts, because those can propagate information towards the least
significant bit. Fix it by always truncating inputs for 32-bit ops to 32
bits.

Also get rid of the coerce_reg_to_size() after the ALU op, since that has
no effect.

Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>