www.infradead.org Git - linux-platform-drivers-x86.git/log

Merge branches 'pm-core', 'pm-pci', 'pm-sleep', 'pm-domains' and 'powercap'

* pm-core:
  PM: runtime: Add documentation for pm_runtime_resume_and_get()
  PM: runtime: Replace inline function pm_runtime_callbacks_present()
  PM: core: Remove duplicate declaration from header file

* pm-pci:
  PCI: PM: Do not read power state in pci_enable_device_flags()

* pm-sleep:
  PM: wakeup: remove redundant assignment to variable retval
  PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check
  PM: wakeup: use dev_set_name() directly
  PM: sleep: fix typos in comments
  freezer: Remove unused inline function try_to_freeze_nowarn()

* pm-domains:
  PM: domains: Don't runtime resume devices at genpd_prepare()

* powercap:
  powercap: RAPL: Fix struct declaration in header file
  MAINTAINERS: Add DTPM subsystem maintainer
  powercap: Add Hygon Fam18h RAPL support

Merge branch 'pm-cpufreq'

* pm-cpufreq: (22 commits)
  cpufreq: Kconfig: fix documentation links
  cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits()
  cpufreq: armada-37xx: Fix module unloading
  cpufreq: armada-37xx: Remove cur_frequency variable
  cpufreq: armada-37xx: Fix determining base CPU frequency
  cpufreq: armada-37xx: Fix driver cleanup when registration failed
  clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
  clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
  cpufreq: armada-37xx: Fix the AVS value for load L1
  clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
  cpufreq: armada-37xx: Fix setting TBG parent for load levels
  cpufreq: Remove unused for_each_policy macro
  cpufreq: dt: dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER
  cpufreq: intel_pstate: Clean up frequency computations
  cpufreq: cppc: simplify default delay_us setting
  cpufreq: Rudimentary typos fix in the file s5pv210-cpufreq.c
  cpufreq: CPPC: Add support for frequency invariance
  ia64: fix format string for ia64-acpi-cpu-freq
  cpufreq: schedutil: Call sugov_update_next_freq() before check to fast_switch_enabled
  arch_topology: Export arch_freq_scale and helpers
  ...

PM: wakeup: remove redundant assignment to variable retval

The variable retval is being initialized with a value that is
never read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check

Hibernation fails on a system in fips mode because md5 is used for the e820
integrity check and is not available. Use crc32 instead.

The check is intended to detect whether the E820 memory map provided
by the firmware after cold boot unexpectedly differs from the one that
was in use when the hibernation image was created. In this case, the
hibernation image cannot be restored, as it may cover memory regions
that are no longer available to the OS.

A non-cryptographic checksum such as CRC-32 is sufficient to detect such
inadvertent deviations.

Fixes: 62a03defeabd ("PM / hibernate: Verify the consistent of e820 memory map by md5 digest")
Reviewed-by: Eric Biggers <ebiggers@google.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Kconfig: fix documentation links

User documentation for cpufreq governors and drivers has been moved to
admin-guide; adjust references from Kconfig entries accordingly.

Remove references from undocumented cpufreq drivers, as well as the
'userspace' cpufreq governor, for which no additional details are
provided in the admin-guide text.

Fixes: 2a0e49279850 ("cpufreq: User/admin documentation update and consolidation")
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PM: wakeup: use dev_set_name() directly

We have open coded dev_set_name() implementation, replace that
with a direct call.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull ARM cpufreq updates for v5.13 from Viresh Kumar:

"- Fix typos in s5pv210 cpufreq driver (Bhaskar Chowdhury).

- Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from
   1000 MHz (Pali Rohár and Marek Behún).

- cpufreq-dt: Return -EPROBE_DEFER on failure to add table (Quanyang
   Wang).

- Minor cleanup in cppc driver (Tom Saeger).

- Add frequency invariance support for CPPC driver and generalize
   freq invariance support arch-topology driver (Viresh Kumar)."

* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: armada-37xx: Fix module unloading
  cpufreq: armada-37xx: Remove cur_frequency variable
  cpufreq: armada-37xx: Fix determining base CPU frequency
  cpufreq: armada-37xx: Fix driver cleanup when registration failed
  clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
  clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
  cpufreq: armada-37xx: Fix the AVS value for load L1
  clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
  cpufreq: armada-37xx: Fix setting TBG parent for load levels
  cpufreq: dt: dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER
  cpufreq: cppc: simplify default delay_us setting
  cpufreq: Rudimentary typos fix in the file s5pv210-cpufreq.c
  cpufreq: CPPC: Add support for frequency invariance
  arch_topology: Export arch_freq_scale and helpers
  arch_topology: Allow multiple entities to provide sched_freq_tick() callback
  arch_topology: Rename freq_scale as arch_freq_scale

PM: runtime: Add documentation for pm_runtime_resume_and_get()

Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to
deal with usage counter") added a new runtime-PM API function without
updating the documentation in runtime_pm.rst.

Add the missing documentation.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits()

Because pstate.max_freq is always equal to the product of
pstate.max_pstate and pstate.scaling and, analogously,
pstate.turbo_freq is always equal to the product of
pstate.turbo_pstate and pstate.scaling, the result of the
max_policy_perf computation in intel_pstate_update_perf_limits() is
always equal to the quotient of policy_max and pstate.scaling,
regardless of whether or not turbo is disabled. Analogously, the
result of min_policy_perf in intel_pstate_update_perf_limits() is
always equal to the quotient of policy_min and pstate.scaling.

Accordingly, intel_pstate_update_perf_limits() need not check
whether or not turbo is enabled at all and in order to compute
max_policy_perf and min_policy_perf it can always divide policy_max
and policy_min, respectively, by pstate.scaling. Make it do so.

While at it, move the definition and initialization of the
turbo_max local variable to the code branch using it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>

cpufreq: armada-37xx: Fix module unloading

This driver is missing module_exit hook. Add proper driver exit function
which unregisters the platform device and cleans up the data.

Signed-off-by: Pali Rohár <pali@kernel.org>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: armada-37xx: Remove cur_frequency variable

Variable cur_frequency in armada37xx_cpufreq_driver_init() is unused.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: armada-37xx: Fix determining base CPU frequency

When current CPU load is not L0 then loading armada-37xx-cpufreq.ko driver
fails with following error:

# modprobe armada-37xx-cpufreq
[ 502.702097] Unsupported CPU frequency 250 MHz

This issue was partially fixed by commit 8db82563451f ("cpufreq:
armada-37xx: fix frequency calculation for opp"), but only for calculating
CPU frequency for opp.

Fix this also for determination of base CPU frequency.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: armada-37xx: Fix driver cleanup when registration failed

Commit 8db82563451f ("cpufreq: armada-37xx: fix frequency calculation for
opp") changed calculation of frequency passed to the dev_pm_opp_add()
function call. But the code for dev_pm_opp_remove() function call was not
updated, so the driver cleanup phase does not work when registration fails.

This fixes the issue by using the same frequency in both calls.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 8db82563451f ("cpufreq: armada-37xx: fix frequency calculation for opp")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0

When CPU frequency is at 250 MHz and set_rate() is called with 500 MHz (L1)
quickly followed by a call with 1 GHz (L0), the CPU does not necessarily
stay in L1 for at least 20ms as is required by Marvell errata.

This situation happens frequently with the ondemand cpufreq governor and
can be also reproduced with userspace governor. In most cases it causes CPU
to crash.

This change fixes the above issue and ensures that the CPU always stays in
L1 for at least 20ms when switching from any state to L0.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 61c40f35f5cd ("clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz

It was observed that the workaround introduced by commit 61c40f35f5cd
("clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to
1.2GHz") when base CPU frequency is 1.2 GHz is also required when base
CPU frequency is 1 GHz. Otherwise switching CPU frequency directly from
L2 (250 MHz) to L0 (1 GHz) causes a crash.

When base CPU frequency is just 800 MHz no crashed were observed during
switch from L2 to L0.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 2089dc33ea0e ("clk: mvebu: armada-37xx-periph: add DVFS support for cpu clocks")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: armada-37xx: Fix the AVS value for load L1

The original CPU voltage value for load L1 is too low for Armada 37xx SoC
when base CPU frequency is 1000 or 1200 MHz. It leads to instabilities
where CPU gets stuck soon after dynamic voltage scaling from load L1 to L0.

Update the CPU voltage value for load L1 accordingly when base frequency is
1000 or 1200 MHz. The minimal L1 value for base CPU frequency 1000 MHz is
updated from the original 1.05V to 1.108V and for 1200 MHz is updated to
1.155V. This minimal L1 value is used only in the case when it is lower
than value for L0.

This change fixes CPU instability issues on 1 GHz and 1.2 GHz variants of
Espressobin and 1 GHz Turris Mox.

Marvell previously for 1 GHz variant of Espressobin provided a patch [1]
suitable only for their Marvell Linux kernel 4.4 fork which workarounded
this issue. Patch forced CPU voltage value to 1.108V in all loads. But
such change does not fix CPU instability issues on 1.2 GHz variants of
Armada 3720 SoC.

During testing we come to the conclusion that using 1.108V as minimal
value for L1 load makes 1 GHz variants of Espressobin and Turris Mox boards
stable. And similarly 1.155V for 1.2 GHz variant of Espressobin.

These two values 1.108V and 1.155V are documented in Armada 3700 Hardware
Specifications as typical initial CPU voltage values.

Discussion about this issue is also at the Armbian forum [2].

[1] - https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/dc33b62c90696afb6adc7dbcc4ebbd48bedec269
[2] - https://forum.armbian.com/topic/10429-how-to-make-espressobin-v7-stable/

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 1c3528232f4b ("cpufreq: armada-37xx: Add AVS support")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock

Remove the .set_parent method in clk_pm_cpu_ops.

This method was supposed to be needed by the armada-37xx-cpufreq driver,
but was never actually called due to wrong assumptions in the cpufreq
driver. After this was fixed in the cpufreq driver, this method is not
needed anymore.

Signed-off-by: Marek Behún <kabel@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Pali Rohár <pali@kernel.org>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 2089dc33ea0e ("clk: mvebu: armada-37xx-periph: add DVFS support for cpu clocks")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: armada-37xx: Fix setting TBG parent for load levels

With CPU frequency determining software [1] we have discovered that
after this driver does one CPU frequency change, the base frequency of
the CPU is set to the frequency of TBG-A-P clock, instead of the TBG
that is parent to the CPU.

This can be reproduced on EspressoBIN and Turris MOX:
  cd /sys/devices/system/cpu/cpufreq/policy0
  echo powersave >scaling_governor
  echo performance >scaling_governor

Running the mhz tool before this driver is loaded reports 1000 MHz, and
after loading the driver and executing commands above the tool reports
800 MHz.

The change of TBG clock selector is supposed to happen in function
armada37xx_cpufreq_dvfs_setup. Before the function returns, it does
this:
  parent = clk_get_parent(clk);
  clk_set_parent(clk, parent);

The armada-37xx-periph clock driver has the .set_parent method
implemented correctly for this, so if the method was actually called,
this would work. But since the introduction of the common clock
framework in commit b2476490ef11 ("clk: introduce the common clock..."),
the clk_set_parent function checks whether the parent is actually
changing, and if the requested new parent is same as the old parent
(which is obviously the case for the code above), the .set_parent method
is not called at all.

This patch fixes this issue by filling the correct TBG clock selector
directly in the armada37xx_cpufreq_dvfs_setup during the filling of
other registers at the same address. But the determination of CPU TBG
index cannot be done via the common clock framework, therefore we need
to access the North Bridge Peripheral Clock registers directly in this
driver.

[1] https://github.com/wtarreau/mhz

Signed-off-by: Marek Behún <kabel@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Pali Rohár <pali@kernel.org>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Merge back earlier cpuidle updates for v5.13.

Merge tag 'cpuidle-v5.13-rc1' of https://git.linaro.org/people/daniel.lezcano/linux

Pull ARM cpuidle updates for v5.13 from Daniel Lezcano:

"- Fix the C7 state on the tegra114 by setting the L2-no-flush flag
   unconditionally (Dmitry Osipenko)

- Remove the do_idle firmware call as it is not supported by the ATF
   on tegra SoC (Dmitry Osipenko)

- Add a missing dependency on CONFIG_MMU to prevent linkage error (He
   Ying)"

* tag 'cpuidle-v5.13-rc1' of https://git.linaro.org/people/daniel.lezcano/linux:
  cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration
  cpuidle: tegra: Remove do_idle firmware call
  cpuidle: tegra: Fix C7 idling state on Tegra114

cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration

When CONFIG_ARM_QCOM_SPM_CPUIDLE is y and CONFIG_MMU is not set,
compiling errors are encountered as follows:

drivers/cpuidle/cpuidle-qcom-spm.o: In function `spm_dev_probe':
cpuidle-qcom-spm.c:(.text+0x140): undefined reference to `cpu_resume_arm'
cpuidle-qcom-spm.c:(.text+0x148): undefined reference to `cpu_resume_arm'

Note that cpu_resume_arm is defined when MMU is set. So, add dependency
on MMU in ARM_QCOM_SPM_CPUIDLE configuration.

Fixes: a871be6b8eee ("cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: He Ying <heying24@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210406123328.92904-1-heying24@huawei.com

cpuidle: tegra: Remove do_idle firmware call

The do_idle firmware call is unused by all Tegra SoCs, hence remove it in
order to keep driver's code clean.

Tested-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114
Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210302095405.28453-2-digetx@gmail.com

cpuidle: tegra: Fix C7 idling state on Tegra114

Trusted Foundation firmware doesn't implement the do_idle call and in
this case suspending should fall back to the common suspend path. In order
to fix this issue we will unconditionally set the NOFLUSH_L2 mode via
firmware call, which is a NO-OP on Tegra30/124, and then proceed to the
C7 idling, like it was done by the older Tegra114 cpuidle driver.

Fixes: 14e086baca50 ("cpuidle: tegra: Squash Tegra114 driver into the common driver")
Cc: stable@vger.kernel.org # 5.7+
Reported-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114
Tested-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114
Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210302095405.28453-1-digetx@gmail.com

PM: sleep: fix typos in comments

Change "occured" to "occurred" in kernel/power/autosleep.c.

Change "consiting" to "consisting" in kernel/power/snapshot.c.

Change "avaiable" to "available" in kernel/power/swap.c.

No functionality changed.

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Remove unused for_each_policy macro

Macro 'for_each_policy' has become unused since commit
f963735a3ca3 ("cpufreq: Create for_each_{in}active_policy()"), so
remove it.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_idle: add Iclelake-D support

This patch adds Icelake Xeon D support to the intel_idle driver.

Since Icelake D and Icelake SP C-state characteristics the same,
we use Icelake SP C-states table for Icelake D as well.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PM: runtime: Replace inline function pm_runtime_callbacks_present()

Commit 9a7875461fd0 ("PM: runtime: Replace pm_runtime_callbacks_present()")
forgot to change the inline version.

Fixes: 9a7875461fd0 ("PM: runtime: Replace pm_runtime_callbacks_present()")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

freezer: Remove unused inline function try_to_freeze_nowarn()

There is no caller in tree, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

powercap: RAPL: Fix struct declaration in header file

struct rapl_package is declared twice in intel_rapl.h, once at line 80
and once earlier.

Code inspection suggests that the first instance should be struct
rapl_domain rather than rapl_package, so change it.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpuidle: menu: Take negative "sleep length" values into account

Make the menu governor check the tick_nohz_get_next_hrtimer()
return value so as to avoid dealing with negative "sleep length"
values and make it use that value directly when the tick is stopped.

While at it, rename local variable delta_next in menu_select() to
delta_tick which better reflects its purpose.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpuidle: teo: Take negative "sleep length" values into account

Modify the TEO governor to take possible negative return values of
tick_nohz_get_next_hrtimer() into account by changing the data type
of some variables used by it to s64 which allows it to carry out
computations without potentially problematic data type conversions
into u64.

Also change the computations in teo_select() so that the negative
values themselves are handled in a natural way to avoid adding extra
negative value checks to that function.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpuidle: teo: Adjust handling of very short idle times

If the time till the next timer event is shorter than the target
residency of the first idle state (state 0), the TEO governor does
not update its metrics for any idle states, but arguably it should
record a "hit" for idle state 0 in that case, so modify it to do
that.

Accordingly, also make it record an "early hit" for idle state 0 if
the measured idle duration is less than its target residency, which
allows one branch more to be dropped from teo_update().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpuidle: Use s64 as exit_latency_ns and target_residency_ns data type

Subsequent changes will cause the exit_latency_ns and target_residency_ns
fields in struct cpuidle_state to be used in computations in which data
type conversions to u64 may turn a negative number close to zero into
a verly large positive number leading to incorrect results.

In preparation for that, change the data type of the fields mentioned
above to s64, but ensure that they will not be negative themselves.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

tick/nohz: Improve tick_nohz_get_next_hrtimer() kerneldoc

Make the tick_nohz_get_next_hrtimer() kerneldoc comment state clearly
that the function may return negative numbers.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PM: core: Remove duplicate declaration from header file

struct device is declared twice, so remove the duplicate.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Linux 5.12-rc6

firewire: nosy: Fix a use-after-free bug in nosy_ioctl()

For each device, the nosy driver allocates a pcilynx structure.
A use-after-free might happen in the following scenario:

1. Open nosy device for the first time and call ioctl with command
    NOSY_IOC_START, then a new client A will be malloced and added to
    doubly linked list.
2. Open nosy device for the second time and call ioctl with command
    NOSY_IOC_START, then a new client B will be malloced and added to
    doubly linked list.
3. Call ioctl with command NOSY_IOC_START for client A, then client A
    will be readded to the doubly linked list. Now the doubly linked
    list is messed up.
4. Close the first nosy device and nosy_release will be called. In
    nosy_release, client A will be unlinked and freed.
5. Close the second nosy device, and client A will be referenced,
    resulting in UAF.

The root cause of this bug is that the element in the doubly linked list
is reentered into the list.

Fix this bug by adding a check before inserting a client.  If a client
is already in the linked list, don't insert it.

The following KASAN report reveals it:

   BUG: KASAN: use-after-free in nosy_release+0x1ea/0x210
   Write of size 8 at addr ffff888102ad7360 by task poc
   CPU: 3 PID: 337 Comm: poc Not tainted 5.12.0-rc5+ #6
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
   Call Trace:
     nosy_release+0x1ea/0x210
     __fput+0x1e2/0x840
     task_work_run+0xe8/0x180
     exit_to_user_mode_prepare+0x114/0x120
     syscall_exit_to_user_mode+0x1d/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

   Allocated by task 337:
     nosy_open+0x154/0x4d0
     misc_open+0x2ec/0x410
     chrdev_open+0x20d/0x5a0
     do_dentry_open+0x40f/0xe80
     path_openat+0x1cf9/0x37b0
     do_filp_open+0x16d/0x390
     do_sys_openat2+0x11d/0x360
     __x64_sys_open+0xfd/0x1a0
     do_syscall_64+0x33/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

   Freed by task 337:
     kfree+0x8f/0x210
     nosy_release+0x158/0x210
     __fput+0x1e2/0x840
     task_work_run+0xe8/0x180
     exit_to_user_mode_prepare+0x114/0x120
     syscall_exit_to_user_mode+0x1d/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

   The buggy address belongs to the object at ffff888102ad7300 which belongs to the cache kmalloc-128 of size 128
   The buggy address is located 96 bytes inside of 128-byte region [ffff888102ad7300, ffff888102ad7380)

[ Modified to use 'list_empty()' inside proper lock  - Linus ]

Link: https://lore.kernel.org/lkml/1617433116-5930-1-git-send-email-zheyuma97@gmail.com/
Reported-and-tested-by: 马哲宇 (Zheyu Ma) <zheyuma97@gmail.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://github.com/openrisc/linux

Pull OpenRISC fix from Stafford Horne:
"Fix duplicate header include in Litex SOC driver"

* tag 'for-linus' of git://github.com/openrisc/linux:
soc: litex: Remove duplicated header file inclusion

Merge tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block

POull io_uring fix from Jens Axboe:
"Just fixing a silly braino in a previous patch, where we'd end up
  failing to compile if CONFIG_BLOCK isn't enabled.

  Not that a lot of people do that, but kernel bot spotted it and it's
  probably prudent to just flush this out now before -rc6.

  Sorry about that, none of my test compile configs have !CONFIG_BLOCK"

* tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block:
  io_uring: fix !CONFIG_BLOCK compilation failure

soc: litex: Remove duplicated header file inclusion

The header file <linux/errno.h> is already included above and can be
removed here.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Mateusz Holenko <mholenko@antmicro.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>

Merge tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:
"Two more gfs2 fixes"

* tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: report "already frozen/thawed" errors
gfs2: Flag a withdraw if init_threads() fails

Merge tag 'riscv-for-linus-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
"A handful of fixes for 5.12:

   - fix a stack tracing regression related to "const register asm"
     variables, which have unexpected behavior.

   - ensure the value to be written by put_user() is evaluated before
     enabling access to userspace memory..

   - align the exception vector table correctly, so we don't rely on the
     firmware's handling of unaligned accesses.

   - build fix to make NUMA depend on MMU, which triggered on some
     randconfigs"

* tag 'riscv-for-linus-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Make NUMA depend on MMU
  riscv: remove unneeded semicolon
  riscv,entry: fix misaligned base for excp_vect_table
  riscv: evaluate put_user() arg before enabling user access
  riscv: Drop const annotation for sp

Merge tag 'powerpc-5.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"Fix a bug on pseries where spurious wakeups from H_PROD would prevent
  partition migration from succeeding.

  Fix oopses seen in pcpu_alloc(), caused by parallel faults of the
  percpu mapping causing us to corrupt the protection key used for the
  mapping, and cause a fatal key fault.

  Thanks to Aneesh Kumar K.V, Murilo Opsfelder Araujo, and Nathan Lynch"

* tag 'powerpc-5.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm/book3s64: Use the correct storage key value when calling H_PROTECT
  powerpc/pseries/mobility: handle premature return from H_JOIN
  powerpc/pseries/mobility: use struct for shared state

Merge tag 'hyperv-fixes-signed-20210402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V fixes from Wei Liu:
"One fix from Lu Yunlong for a double free in hvfb_probe"

* tag 'hyperv-fixes-signed-20210402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
video: hyperv_fb: Fix a double free in hvfb_probe

Merge tag 'driver-core-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
"Here is a single driver core fix for a reported problem with differed
  probing. It has been in linux-next for a while with no reported
  problems"

* tag 'driver-core-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: clear deferred probe reason on probe retry

Merge tag 'char-misc-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a few small driver char/misc changes for 5.12-rc6.

  Nothing major here, a few fixes for reported issues:

   - interconnect fixes for problems found

   - fbcon syzbot-found fix

   - extcon fixes

   - firmware stratix10 bugfix

   - MAINTAINERS file update.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  drivers: video: fbcon: fix NULL dereference in fbcon_cursor()
  mei: allow map and unmap of client dma buffer only for disconnected client
  MAINTAINERS: Add linux-phy list and patchwork
  interconnect: Fix kerneldoc warning
  firmware: stratix10-svc: reset COMMAND_RECONFIG_FLAG_PARTIAL to 0
  extcon: Fix error handling in extcon_dev_register
  extcon: Add stubs for extcon_register_notifier_all() functions
  interconnect: core: fix error return code of icc_link_destroy()
  interconnect: qcom: msm8939: remove rpm-ids from non-RPM nodes

Merge tag 'staging-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are two rtl8192e staging driver fixes for reported problems.

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8192e: Change state information from u16 to u8
  staging: rtl8192e: Fix incorrect source in memcpy()

Merge tag 'tty-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial driver fix from Greg KH:
"Here is a single serial driver fix for 5.12-rc6. Is is a revert of a
  change that showed up in 5.9 that has been reported to cause problems.

  It has been in linux-next for a while with no reported issues"

* tag 'tty-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  soc: qcom-geni-se: Cleanup the code to remove proxy votes

Merge tag 'usb-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a few small USB driver fixes for 5.12-rc6 to resolve reported
  problems.

  They include:

   - a number of cdc-acm fixes for reported problems. It seems more
     people are using this driver lately...

   - dwc3 driver fixes for reported problems, and fixes for the fixes :)

   - dwc2 driver fixes for reported issues.

   - musb driver fix.

   - new USB quirk additions.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (23 commits)
  usb: dwc2: Prevent core suspend when port connection flag is 0
  usb: dwc2: Fix HPRT0.PrtSusp bit setting for HiKey 960 board.
  usb: musb: Fix suspend with devices connected for a64
  usb: xhci-mtk: fix broken streams issue on 0.96 xHCI
  usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable
  usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control()
  USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem
  USB: cdc-acm: do not log successful probe on later errors
  USB: cdc-acm: always claim data interface
  USB: cdc-acm: use negation for NULL checks
  USB: cdc-acm: clean up probe error labels
  USB: cdc-acm: drop redundant driver-data reset
  USB: cdc-acm: drop redundant driver-data assignment
  USB: cdc-acm: fix use-after-free after probe failure
  USB: cdc-acm: fix double free on probe failure
  USB: cdc-acm: downgrade message to debug
  USB: cdc-acm: untangle a circular dependency between callback and softint
  cdc-acm: fix BREAK rx code path adding necessary calls
  usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference
  usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield
  ...

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"A single fix to iscsi for a rare race condition which can cause a
kernel panic"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: Fix race condition between login and sync thread

io_uring: fix !CONFIG_BLOCK compilation failure

kernel test robot correctly pinpoints a compilation failure if
CONFIG_BLOCK isn't set:

fs/io_uring.c: In function '__io_complete_rw':
>> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration]
    2509 |  if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) {
         |                                                ^~~~~~~~~~~~~~~~~~~~
         |                                                io_rw_reissue
    cc1: some warnings being treated as errors

Ensure that we have a stub declaration of io_rw_should_reissue() for
!CONFIG_BLOCK.

Fixes: 230d50d448ac ("io_uring: move reissue into regular IO path")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- Remove comment that never came to fruition in 22 years of development
   (Christoph)

- Remove unused request flag (Christoph)

- Fix for null_blk fake timeout handling (Damien)

- Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel)

- Error propagation fix for multiple split bios (Yufen)

* tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
  block: remove the unused RQF_ALLOCED flag
  block: update a few comments in uapi/linux/blkpg.h
  block: don't ignore REQ_NOWAIT for direct IO
  null_blk: fix command timeout completion handling
  block: only update parent bi_status when bio fail

Merge tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Nothing really major in here, and finally nothing really related to
  signals. A few minor fixups related to the threading changes, and some
  general fixes, that's it.

  There's the pending gdb-get-confused-about-arch, but that's more of a
  cosmetic issue, nothing that hinder use of it. And given that other
  archs will likely be affected by that oddity too, better to postpone
  any changes there until 5.13 imho"

* tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
  io_uring: move reissue into regular IO path
  io_uring: fix EIOCBQUEUED iter revert
  io_uring/io-wq: protect against sprintf overflow
  io_uring: don't mark S_ISBLK async work as unbounded
  io_uring: drop sqd lock before handling signals for SQPOLL
  io_uring: handle setup-failed ctx in kill_timeouts
  io_uring: always go for cancellation spin on exec

Merge tag 'acpi-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI tables management issue, an issue related to the
  ACPI enumeration of devices and CPU wakeup in the ACPI processor
  driver.

  Specifics:

   - Ensure that the memory occupied by ACPI tables on x86 will always
     be reserved to prevent it from being allocated for other purposes
     which was possible in some cases (Rafael Wysocki).

   - Fix the ACPI device enumeration code to prevent it from attempting
     to evaluate the _STA control method for devices with unmet
     dependencies which is likely to fail (Hans de Goede).

   - Fix the handling of CPU0 wakeup in the ACPI processor driver to
     prevent CPU0 online failures from occurring (Vitaly Kuznetsov)"

* tag 'acpi-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
  ACPI: scan: Fix _STA getting called on devices with unmet dependencies
  ACPI: tables: x86: Reserve memory occupied by ACPI tables

Merge tag 'pm-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix a race condition and an ordering issue related to using
  device links in the runtime PM framework and two kerneldoc comments in
  cpufreq.

  Specifics:

   - Fix race condition related to the handling of supplier devices
     during consumer device probe and fix the order of decrementation of
     two related reference counters in the runtime PM core code handling
     supplier devices (Adrian Hunter).

   - Fix kerneldoc comments in cpufreq that have not been updated along
     with the functions documented by them (Geert Uytterhoeven)"

* tag 'pm-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: runtime: Fix race getting/putting suppliers at probe
  PM: runtime: Fix ordering in pm_runtime_get_suppliers()
  cpufreq: Fix scaling_{available,boost}_frequencies_show() comments

block: remove the unused RQF_ALLOCED flag

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: update a few comments in uapi/linux/blkpg.h

The big top of the file comment talk about grand plans that never
happened, so remove them to not confuse the readers. Also mark the
devname and volname fields as ignored as they were never used by the
kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'trace-v5.12-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Fix stack trace entry size to stop showing garbage

  The macro that creates both the structure and the format displayed to
  user space for the stack trace event was changed a while ago to fix
  the parsing by user space tooling. But this change also modified the
  structure used to store the stack trace event. It changed the caller
  array field from [0] to [8].

  Even though the size in the ring buffer is dynamic and can be
  something other than 8 (user space knows how to handle this), the 8
  extra words was not accounted for when reserving the event on the ring
  buffer, and added 8 more entries, due to the calculation of
  "sizeof(*entry) + nr_entries * sizeof(long)", as the sizeof(*entry)
  now contains 8 entries.

  The size of the caller field needs to be subtracted from the size of
  the entry to create the correct allocation size"

* tag 'trace-v5.12-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix stack trace event size

io_uring: move reissue into regular IO path

It's non-obvious how retry is done for block backed files, when it happens
off the kiocb done path. It also makes it tricky to deal with the iov_iter
handling.

Just mark the req as needing a reissue, and handling it from the
submission path instead. This makes it directly obvious that we're not
re-importing the iovec from userspace past the submit point, and it means
that we can just reuse our usual -EAGAIN retry path from the read/write
handling.

At some point in the future, we'll gain the ability to always reliably
return -EAGAIN through the stack. A previous attempt on the block side
didn't pan out and got reverted, hence the need to check for this
information out-of-band right now.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branches 'acpi-tables' and 'acpi-scan'

* acpi-tables:
ACPI: tables: x86: Reserve memory occupied by ACPI tables

* acpi-scan:
ACPI: scan: Fix _STA getting called on devices with unmet dependencies

Merge branch 'pm-cpufreq'

* pm-cpufreq:
cpufreq: Fix scaling_{available,boost}_frequencies_show() comments

block: don't ignore REQ_NOWAIT for direct IO

If IOCB_NOWAIT is set on submission, then that needs to get propagated to
REQ_NOWAIT on the block side. Otherwise we completely lose this
information, and any issuer of IOCB_NOWAIT IO will potentially end up
blocking on eg request allocation on the storage side.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

riscv: Make NUMA depend on MMU

NUMA is useless when NOMMU, and it leads some build error,
make it depend on MMU.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

riscv: remove unneeded semicolon

Eliminate the following coccicheck warning:
./arch/riscv/mm/kasan_init.c:219:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

riscv,entry: fix misaligned base for excp_vect_table

In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
base of the table is not 8-byte aligned, loading an entry in the table
will raise a misaligned exception. Although such exception will be
handled by opensbi/bbl, this still causes performance degradation.

Signed-off-by: Zihao Yu <yuzihao@ict.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

riscv: evaluate put_user() arg before enabling user access

The <asm/uaccess.h> header has a problem with put_user(a, ptr) if
the 'a' is not a simple variable, such as a function. This can lead
to the compiler producing code as so:

1: enable_user_access()
2: evaluate 'a' into register 'r'
3: put 'r' to 'ptr'
4: disable_user_acess()

The issue is that 'a' is now being evaluated with the user memory
protections disabled. So we try and force the evaulation by assigning
'x' to __val at the start, and hoping the compiler barriers in
enable_user_access() do the job of ordering step 2 before step 1.

This has shown up in a bug where 'a' sleeps and thus schedules out
and loses the SR_SUM flag. This isn't sufficient to fully fix, but
should reduce the window of opportunity. The first instance of this
we found is in scheudle_tail() where the code does:

$ less -N kernel/sched/core.c

4263 if (current->set_child_tid)
4264 put_user(task_pid_vnr(current), current->set_child_tid);

Here, the task_pid_vnr(current) is called within the block that has
enabled the user memory access. This can be made worse with KASAN
which makes task_pid_vnr() a rather large call with plenty of
opportunity to sleep.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
Suggested-by: Arnd Bergman <arnd@arndb.de>
--
Changes since v1:
- fixed formatting and updated the patch description with more info

Changes since v2:
- fixed commenting on __put_user() (schwab@linux-m68k.org)

Change since v3:
- fixed RFC in patch title. Should be ready to merge.

Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

riscv: Drop const annotation for sp

The const annotation should not be used for 'sp', or it will
become read only and lead to bad stack output.

Fixes: dec822771b01 ("riscv: stacktrace: Move register keyword to beginning of declaration")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

Merge tag 'lto-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull LTO fix from Kees Cook:
"It seems that there is a bug in ld.bfd when doing module section
  merging.

  As explicit merging is only needed for LTO, the work-around is to only
  do it under LTO, leaving the original section layout choices alone
  under normal builds:

   - Only perform explicit module section merges under LTO (Sean
     Christopherson)"

* tag 'lto-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled

kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled

Merge module sections only when using Clang LTO. With ld.bfd, merging
sections does not appear to update the symbol tables for the module,
e.g. 'readelf -s' shows the value that a symbol would have had, if
sections were not merged. ld.lld does not show this problem.

The stale symbol table breaks gdb's function disassembler, and presumably
other things, e.g.

gdb -batch -ex "file arch/x86/kvm/kvm.ko" -ex "disassemble kvm_init"

reads the wrong bytes and dumps garbage.

Fixes: dd2776222abb ("kbuild: lto: merge module sections")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210322234438.502582-1-seanjc@google.com

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"It's a bit larger than I (and probably you) would like by the time we
  get to -rc6, but perhaps not entirely unexpected since the changes in
  the last merge window were larger than usual.

  x86:
   - Fixes for missing TLB flushes with TDP MMU

   - Fixes for race conditions in nested SVM

   - Fixes for lockdep splat with Xen emulation

   - Fix for kvmclock underflow

   - Fix srcdir != builddir builds

   - Other small cleanups

  ARM:
   - Fix GICv3 MMIO compatibility probing

   - Prevent guests from using the ARMv8.4 self-hosted tracing
     extension"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)
  KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()
  KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken
  KVM: x86: reduce pvclock_gtod_sync_lock critical sections
  KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
  KVM: SVM: load control fields from VMCB12 before checking them
  KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
  KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
  KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
  KVM: make: Fix out-of-source module builds
  selftests: kvm: make hardware_disable_test less verbose
  KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE
  KVM: x86: remove unused declaration of kvm_write_tsc()
  KVM: clean up the unused argument
  tools/kvm_stat: Add restart delay
  KVM: arm64: Fix CPU interface MMIO compatibility detection
  KVM: arm64: Disable guest access to trace filter controls
  KVM: arm64: Hide system instruction access to Trace registers

Merge tag 'drm-fixes-2021-04-02' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Things have settled down in time for Easter, a random smattering of
  small fixes across a few drivers.

  I'm guessing though there might be some i915 and misc fixes out there
  I haven't gotten yet, but since today is a public holiday here, I'm
  sending this early so I can have the day off, I'll see if more
  requests come in and decide what to do with them later.

  amdgpu:
   - Polaris idle power fix
   - VM fix
   - Vangogh S3 fix
   - Fixes for non-4K page sizes

  amdkfd:
   - dqm fence memory corruption fix

  tegra:
   - lockdep warning fix
   - runtine PM reference fix
   - display controller fix
   - PLL Fix

  imx:
   - memory leak in error path fix
   - LDB driver channel registration fix
   - oob array warning in LDB driver

  exynos
   - unused header file removal"

* tag 'drm-fixes-2021-04-02' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: check alignment on CPU page for bo map
  drm/amdgpu: Set a suitable dev_info.gart_page_size
  drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspend
  drm/amdkfd: dqm fence memory corruption
  drm/tegra: sor: Grab runtime PM reference across reset
  drm/tegra: dc: Restore coupling of display controllers
  gpu: host1x: Use different lock classes for each client
  drm/tegra: dc: Don't set PLL clock to 0Hz
  drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
  drm/amd/pm: no need to force MCLK to highest when no display connected
  drm/exynos/decon5433: Remove the unused include statements
  drm/imx: imx-ldb: fix out of bounds array access warning
  drm/imx: imx-ldb: Register LDB channel1 when it is the only channel to be used
  drm/imx: fix memory leak when fails to init

Merge tag 'imx-drm-fixes-2021-04-01' of git://git.pengutronix.de/git/pza/linux into drm-fixes

drm/imx: imx-drm-core and imx-ldb fixes

Fix a memory leak in an error path during DRM device initialization,
fix the LDB driver to register channel 1 even if channel 0 is unused,
and fix an out of bounds array access warning in the LDB driver.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401092235.GA13586@pengutronix.de

Merge tag 'drm/tegra/for-5.12-rc6' of ssh://git.freedesktop.org/git/tegra/linux into drm-fixes

drm/tegra: Fixes for v5.12-rc6

This contains a couple of fixes for various issues such as lockdep
warnings, runtime PM references, coupled display controllers and
misconfigured PLLs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401163352.3348296-1-thierry.reding@gmail.com

tracing: Fix stack trace event size

Commit cbc3b92ce037 fixed an issue to modify the macros of the stack trace
event so that user space could parse it properly. Originally the stack
trace format to user space showed that the called stack was a dynamic
array. But it is not actually a dynamic array, in the way that other
dynamic event arrays worked, and this broke user space parsing for it. The
update was to make the array look to have 8 entries in it. Helper
functions were added to make it parse it correctly, as the stack was
dynamic, but was determined by the size of the event stored.

Although this fixed user space on how it read the event, it changed the
internal structure used for the stack trace event. It changed the array
size from [0] to [8] (added 8 entries). This increased the size of the
stack trace event by 8 words. The size reserved on the ring buffer was the
size of the stack trace event plus the number of stack entries found in
the stack trace. That commit caused the amount to be 8 more than what was
needed because it did not expect the caller field to have any size. This
produced 8 entries of garbage (and reading random data) from the stack
trace event:

<idle>-0 [002] d... 1976396.837549: <stack trace>
=> trace_event_raw_event_sched_switch
=> __traceiter_sched_switch
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> secondary_startup_64_no_verify
=> 0xc8c5e150ffff93de
=> 0xffff93de
=> 0
=> 0
=> 0xc8c5e17800000000
=> 0x1f30affff93de
=> 0x00000004
=> 0x200000000

Instead, subtract the size of the caller field from the size of the event
to make sure that only the amount needed to store the stack trace is
reserved.

Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/
Cc: stable@vger.kernel.org
Fixes: cbc3b92ce037 ("tracing: Set kernel_stack's caller size properly")
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Merge tag 'sound-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Things seem calming down, only usual device-specific fixes for
  HD-audio and USB-audio at this time"

* tag 'sound-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8
  ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks
  ALSA: hda: Re-add dropped snd_poewr_change_state() calls
  ALSA: usb-audio: Apply sample rate quirk to Logitech Connect
  ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook
  ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO

Merge tag 'tomoyo-pr-20210401' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1

Pull tomory fix from Tetsuo Handa:
"An update on 'tomoyo: recognize kernel threads correctly' from Jens
Axboe to not special case PF_IO_WORKER for PF_KTHREAD"

* tag 'tomoyo-pr-20210401' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: don't special case PF_IO_WORKER for PF_KTHREAD

Merge tag 'xarray-5.12' of git://git.infradead.org/users/willy/xarray

Pull XArray fixes from Matthew Wilcox:
"My apologies for the lateness of this. I had a bug reported in the
  test suite, and when I started working on it, I realised I had two
  fixes sitting in the xarray tree since last November. Anyway,
  everything here is fixes, apart from adding xa_limit_16b. The test
  suite passes.

  Summary:

   - Fix a bug when splitting to a non-zero order

   - Documentation fix

   - Add a predefined 16-bit allocation limit

   - Various test suite fixes"

* tag 'xarray-5.12' of git://git.infradead.org/users/willy/xarray:
  idr test suite: Improve reporting from idr_find_test_1
  idr test suite: Create anchor before launching throbber
  idr test suite: Take RCU read lock in idr_find_test_1
  radix tree test suite: Register the main thread with the RCU library
  radix tree test suite: Fix compilation
  XArray: Add xa_limit_16b
  XArray: Fix splitting to non-zero orders
  XArray: Fix split documentation

io_uring: fix EIOCBQUEUED iter revert

iov_iter_revert() is done in completion handlers that happensf before
read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards.
Moreover, even though it may appear being just a no-op, it's actually
races with 1) user forging a new iovec of a different size 2) reissue,
that is done via io-wq continues completely asynchronously.

Fixes: 3e6a0d3c7571c ("io_uring: fix -EAGAIN retry with IOPOLL")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: protect against sprintf overflow

task_pid may be large enough to not fit into the left space of
TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care
about uniqueness, so replace it with safer snprintf().

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: don't mark S_ISBLK async work as unbounded

S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: fix command timeout completion handling

Memory backed or zoned null block devices may generate actual request
timeout errors due to the submission path being blocked on memory
allocation or zone locking. Unlike fake timeouts or injected timeouts,
the request submission path will call blk_mq_complete_request() or
blk_mq_end_request() for these real timeout errors, causing a double
completion and use after free situation as the block layer timeout
handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in
blk_mq_check_expired(). This problem often triggers a NULL pointer
dereference such as:

BUG: kernel NULL pointer dereference, address: 0000000000000050
RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20
...
Call Trace:
  dd_finish_request+0x56/0x80
  blk_mq_free_request+0x37/0x130
  null_handle_cmd+0xbf/0x250 [null_blk]
  ? null_queue_rq+0x67/0xd0 [null_blk]
  blk_mq_dispatch_rq_list+0x122/0x850
  __blk_mq_do_dispatch_sched+0xbb/0x2c0
  __blk_mq_sched_dispatch_requests+0x13d/0x190
  blk_mq_sched_dispatch_requests+0x30/0x60
  __blk_mq_run_hw_queue+0x49/0x90
  process_one_work+0x26c/0x580
  worker_thread+0x55/0x3c0
  ? process_one_work+0x580/0x580
  kthread+0x134/0x150
  ? kthread_create_worker_on_cpu+0x70/0x70
  ret_from_fork+0x1f/0x30

This problem very often triggers when running the full btrfs xfstests
on a memory-backed zoned null block device in a VM with limited amount
of memory.

Avoid this by executing blk_mq_complete_request() in null_timeout_rq()
only for commands that are marked for a fake timeout completion using
the fake_timeout boolean in struct null_cmd. For timeout errors injected
through debugfs, the timeout handler will execute
blk_mq_complete_request()i as before. This is safe as the submission
path does not execute complete requests in this case.

In null_timeout_rq(), also make sure to set the command error field to
BLK_STS_TIMEOUT and to propagate this error through to the request
completion.

Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

idr test suite: Improve reporting from idr_find_test_1

Instead of just reporting an assertion failure, report enough information
that we can start diagnosing exactly went wrong.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

idr test suite: Create anchor before launching throbber

The throbber could race with creation of the anchor entry and cause the
IDR to have zero entries in it, which would cause the test to fail.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

idr test suite: Take RCU read lock in idr_find_test_1

When run on a single CPU, this test would frequently access already-freed
memory. Due to timing, this bug never showed up on multi-CPU tests.

Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

radix tree test suite: Register the main thread with the RCU library

Several test runners register individual worker threads with the
RCU library, but neglect to register the main thread, which can lead
to objects being freed while the main thread is in what appears to be
an RCU critical section.

Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()

Commit 496121c02127 ("ACPI: processor: idle: Allow probing on platforms
with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g.
I'm observing the following on AWS Nitro (e.g r5b.xlarge but other
instance types are affected as well):

# echo 0 > /sys/devices/system/cpu/cpu0/online
# echo 1 > /sys/devices/system/cpu/cpu0/online
<10 seconds delay>
-bash: echo: write error: Input/output error

In fact, the above mentioned commit only revealed the problem and did
not introduce it. On x86, to wakeup CPU an NMI is being used and
hlt_play_dead()/mwait_play_dead() loops are prepared to handle it:

/*
* If NMI wants to wake up CPU0, start CPU0.
*/
if (wakeup_cpu0())
start_cpu0();

cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on
systems where it wasn't called before the above mentioned commit) serves
the same purpose but it doesn't have a path for CPU0. What happens now on
wakeup is:
- NMI is sent to CPU0
- wakeup_cpu0_nmi() works as expected
- we get back to while (1) loop in acpi_idle_play_dead()
- safe_halt() puts CPU0 to sleep again.

The straightforward/minimal fix is add the special handling for CPU0 on x86
and that's what the patch is doing.

Fixes: 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)

Add a test for the issue when KVM_SET_CLOCK(0) call could cause
TSC page value to go very big because of a signedness issue around
hv_clock->system_time.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210326155551.17446-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()

When guest time is reset with KVM_SET_CLOCK(0), it is possible for
'hv_clock->system_time' to become a small negative number. This happens
because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based
on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
kvm_guest_time_update() does (masterclock in use case):

hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;

And 'master_kernel_ns' represents the last time when masterclock
got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
problem, the difference is very small, e.g. I'm observing
hv_clock.system_time = -70 ns. The issue comes from the fact that
'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
compute_tsc_page_parameters() becomes a very big number.

Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in
use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from
going negative.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210331124130.337992-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken

pvclock_gtod_sync_lock can be taken with interrupts disabled if the
preempt notifier calls get_kvmclock_ns to update the Xen
runstate information:

   spin_lock include/linux/spinlock.h:354 [inline]
   get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587
   kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69
   kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100
   kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline]
   kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062

So change the users of the spinlock to spin_lock_irqsave and
spin_unlock_irqrestore.

Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information")
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: reduce pvclock_gtod_sync_lock critical sections

There is no need to include changes to vcpu->requests into
the pvclock_gtod_sync_lock critical section. The changes to
the shared data structures (in pvclock_update_vm_gtod_copy)
already occur under the lock.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-fix-svm-races' into kvm-master

KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit

Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared. This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.

Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).

Cc: stable@vger.kernel.org
Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: load control fields from VMCB12 before checking them

Avoid races between check and use of the nested VMCB controls. This
for example ensures that the VMRUN intercept is always reflected to the
nested hypervisor, instead of being processed by the host. Without this
patch, it is possible to end up with svm->nested.hsave pointing to
the MSR permission bitmap for nested guests.

This bug is CVE-2021-29657.

Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@vger.kernel.org
Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'amd-drm-fixes-5.12-2021-03-31' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.12-2021-03-31:

amdgpu:
- Polaris idle power fix
- VM fix
- Vangogh S3 fix
- Fixes for non-4K page sizes

amdkfd:
- dqm fence memory corruption fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401020057.17831-1-alexander.deucher@amd.com

Merge tag 'exynos-drm-fixes-for-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

Just one cleanup which drops of_gpio.h inclusion.
- This header file isn't used anymore so drop it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1617016858-14081-1-git-send-email-inki.dae@samsung.com

drm/amdgpu: check alignment on CPU page for bo map

The page table of AMDGPU requires an alignment to CPU page so we should
check ioctl parameters for it. Return -EINVAL if some parameter is
unaligned to CPU page, instead of corrupt the page table sliently.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdgpu: Set a suitable dev_info.gart_page_size

In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages. So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).

Signed-off-by: Rui Wang <wangr@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1
[Xi: rebased for drm-next, use max_t for checkpatch,
and reworded commit message.]
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspend

Do the same thing we do for Renoir. We can check, but since
the sbios has started DPM, it will always return true which
causes the driver to skip some of the SMU init when it shouldn't.

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdkfd: dqm fence memory corruption

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Changes since v1:
* Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.

Signed-off-by: Qu Huang <jinsdb@126.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

block: only update parent bi_status when bio fail

For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:

split bio(READ)          kworker

nvme_complete_rq
blk_update_request //split error=0
  bio_endio
    bio_integrity_endio
      queue_work(kintegrityd_wq, &bip->bip_work);

                         bio_integrity_verify_fn
                         bio_endio //split bio
                          __bio_chain_endio
                             if (!parent->bi_status)

                               <interrupt entry>
                               nvme_irq
                                 blk_update_request //parent error=7
                                 req_bio_endio
                                    bio->bi_status = 7 //parent bio
                               <interrupt exit>

                               parent->bi_status = 0
                        parent->bi_end_io() // return bi_status=0

The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.

In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.

Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>