www.infradead.org Git - linux-platform-drivers-x86.git/log

cpufreq: intel_pstate: Use most recent guaranteed performance values

When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is
changed later, user space may want to take advantage of this increased
guaranteed performance.

HWP_CAP.GUARANTEED is not a static value. It can be adjusted by an
out-of-band agent or during an Intel Speed Select performance level
change. The HWP_CAP.MAX is still the maximum achievable performance
with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still
change as long as it remains less than or equal to HWP_CAP.MAX.

When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency
attribute shows the most recent guaranteed frequency value. This
attribute can be used by user space software to update the scaling
min/max limits of the CPU.

Currently, the ->setpolicy() callback already uses the latest
HWP_CAP values when setting HWP_REQ, but the ->verify() callback will
restrict the user settings to the to old guaranteed performance value
which prevents user space from making use of the extra CPU capacity
theoretically available to it after increasing HWP_CAP.GUARANTEED.

To address this, read HWP_CAP in intel_pstate_verify_cpu_policy()
to obtain the maximum P-state that can be used and use that to
confine the policy max limit instead of using the cached and
possibly stale pstate.max_freq value for this purpose.

For consistency, update intel_pstate_update_perf_limits() to use the
maximum available P-state returned by intel_pstate_get_hwp_max() to
compute the maximum frequency instead of using the return value of
intel_pstate_get_max_freq() which, again, may be stale.

This issue is a side-effect of fixing the scaling frequency limits in
commit eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max()
for turbo disabled") which corrected the setting of the reduced scaling
frequency values, but caused stale HWP_CAP.GUARANTEED to be used in
the case at hand.

Fixes: eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled")
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: intel_pstate: Implement the ->adjust_perf() callback

Make intel_pstate expose the ->adjust_perf() callback when it
operates in the passive mode with HWP enabled which causes the
schedutil governor to use that callback instead of ->fast_switch().

The minimum and target performance-level values passed by the
governor to ->adjust_perf() are converted to HWP.REQ.MIN and
HWP.REQ.DESIRED, respectively, which allows the processor to
adjust its configuration to maximize energy-efficiency while
providing sufficient capacity.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: Add special-purpose fast-switching callback for drivers

First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework. In particular,
today the driver has to assume that it should not allow the frequency
to fall below the one requested by the governor (or the required
capacity may not be provided) which may not be the case and which may
lead to excessive energy usage in some scenarios.

Second, the hints passed by these drivers to the hardware need not be
in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.

Address the two points above by adding a special-purpose replacement
for the ->fast_switch callback, called ->adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the CPU capacity to compare them to.

Also update the schedutil governor to use the new callback instead
of ->fast_switch if present and if the utilization mertics are
frequency-invariant (that is requisite for the direct mapping
between the utilization and the CPU performance levels to be a
reasonable approximation).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: schedutil: Add util to struct sg_cpu

Instead of passing util and max between functions while computing the
utilization and capacity, store the former in struct sg_cpu (along
with the latter and bw_dl).

This will allow the current utilization value to be compared with the
one obtained previously (which is requisite for some code changes to
follow this one), but also it causes the code to look slightly more
consistent and cleaner.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

cppc_cpufreq: replace per-cpu data array with a list

The cppc_cpudata per-cpu storage was inefficient (1) additional to causing
functional issues (2) when CPUs are hotplugged out, due to per-cpu data
being improperly initialised.

(1) The amount of information needed for CPPC performance control in its
    cpufreq driver depends on the domain (PSD) coordination type:

    ANY:    One set of CPPC control and capability data (e.g desired
            performance, highest/lowest performance, etc) applies to all
            CPUs in the domain.

    ALL:    Same as ANY. To be noted that this type is not currently
            supported. When supported, information about which CPUs
            belong to a domain is needed in order for frequency change
            requests to be sent to each of them.

    HW:     It's necessary to store CPPC control and capability
            information for all the CPUs. HW will then coordinate the
            performance state based on their limitations and requests.

    NONE:   Same as HW. No HW coordination is expected.

    Despite this, the previous initialisation code would indiscriminately
    allocate memory for all CPUs (all_cpu_data) and unnecessarily
    duplicate performance capabilities and the domain sharing mask and type
    for each possible CPU.

(2) With the current per-cpu structure, when having ANY coordination,
    the cppc_cpudata cpu information is not initialised (will remain 0)
    for all CPUs in a policy, other than policy->cpu. When policy->cpu is
    hotplugged out, the driver will incorrectly use the uninitialised (0)
    value of the other CPUs when making frequency changes. Additionally,
    the previous values stored in the perf_ctrls.desired_perf will be
    lost when policy->cpu changes.

Therefore replace the array of per cpu data with a list. The memory for
each structure is allocated at policy init, where a single structure
can be allocated per policy, not per cpu. In order to accommodate the
struct list_head node in the cppc_cpudata structure, the now unused cpu
and cur_policy variables are removed.

For example, on a arm64 Juno platform with 6 CPUs: (0, 1, 2, 3) in PSD1,
(4, 5) in PSD2 - ANY coordination, the memory allocation comparison shows:

Before patch:

- ANY coordination:
   total    slack      req alloc/free  caller
       0        0        0     0/1     _kernel_size_le_hi32+0x0xffff800008ff7810
       0        0        0     0/6     _kernel_size_le_hi32+0x0xffff800008ff7808
     128       80       48     1/0     _kernel_size_le_hi32+0x0xffff800008ffc070
     768        0      768     6/0     _kernel_size_le_hi32+0x0xffff800008ffc0e4

After patch:

- ANY coordination:
    total    slack      req alloc/free  caller
     256        0      256     2/0     _kernel_size_le_hi32+0x0xffff800008fed410
       0        0        0     0/2     _kernel_size_le_hi32+0x0xffff800008fed274

Additional notes:
- A pointer to the policy's cppc_cpudata is stored in policy->driver_data
- Driver registration is skipped if _CPC entries are not present.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cppc_cpufreq: expose information on frequency domains

Use the existing sysfs attribute "freqdomain_cpus" to expose
information to userspace about CPUs in the same frequency domain.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cppc_cpufreq: clarify support for coordination types

The previous coordination type handling in the cppc_cpufreq init code
created some confusion: the comment mentioned "Support only SW_ANY for
now" while only the SW_ALL/ALL case resulted in a failure. The other
coordination types (HW_ALL/HW, NONE) were silently supported.

Clarify support for coordination types while describing in comments the
intended behavior.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cppc_cpufreq: use policy->cpu as driver of frequency setting

Considering only the currently supported coordination types (ANY, HW,
NONE), this change only makes a difference for the ANY type, when
policy->cpu is hotplugged out. In that case the new policy->cpu will
be different from ((struct cppc_cpudata *)policy->driver_data)->cpu.

While in this case the controls of *ANY* CPU could be used to drive
frequency changes, it's more consistent to use policy->cpu as the
leading CPU, as used in all other cppc_cpufreq functions. Additionally,
the debug prints in cppc_set_perf() would no longer create confusion
when referring to a CPU that is hotplugged out.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'acpi-processor' to satisfy dependencies

ACPI: processor: fix NONE coordination for domain mapping failure

For errors parsing the _PSD domains, a separate domain is returned for
each CPU in the failed _PSD domain with no coordination (as per previous
comment). But contrary to the intention, the code was setting
CPUFREQ_SHARED_TYPE_ALL as coordination type.

Change shared_type to CPUFREQ_SHARED_TYPE_NONE in case of errors parsing
the domain information. The function still returns the error and the caller
is free to bail out the domain initialisation altogether in that case.

Given that both functions return domains with a single CPU, this change
does not affect the functionality, but clarifies the intention.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull ARM cpufreq updates for 5.11-rc1 from Viresh Kumar:

"This contains the following updates:

- Fix imx's NVMEM_IMX_OCOTP dependency (Arnd Bergmann).

- Add support for mt8167 and blacklist mt8516 (Fabien Parent).

- Some ->get() callback related cleanups to the tegra194 driver and
   some optimizations in tegra186 driver (Jon Hunter and Sumit Gupta).

- Power scale improvements to arm_scmi driver (Lukasz Luba).

- Add missing MODULE_DEVICE_TABLE and MODULE_ALIAS to several drivers
   (Pali Rohár).

- Fix error path in mediatek driver (Qinglang Miao).

- Fix memleak in ST's cpufreq driver (Yangtao Li)."

* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (22 commits)
  cpufreq: arm_scmi: Discover the power scale in performance protocol
  firmware: arm_scmi: Add power_scale_mw_get() interface
  cpufreq: tegra194: Rename tegra194_get_speed_common function
  cpufreq: tegra194: Remove unnecessary frequency calculation
  cpufreq: tegra186: Simplify cluster information lookup
  cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning
  cpufreq: imx: fix NVMEM_IMX_OCOTP dependency
  cpufreq: vexpress-spc: Add missing MODULE_ALIAS
  cpufreq: scpi: Add missing MODULE_ALIAS
  cpufreq: loongson1: Add missing MODULE_ALIAS
  cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE
  cpufreq: st: Add missing MODULE_DEVICE_TABLE
  cpufreq: qcom: Add missing MODULE_DEVICE_TABLE
  cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE
  cpufreq: highbank: Add missing MODULE_DEVICE_TABLE
  cpufreq: ap806: Add missing MODULE_DEVICE_TABLE
  cpufreq: mediatek: add missing platform_driver_unregister() on error in mtk_cpufreq_driver_init
  cpufreq: tegra194: get consistent cpuinfo_cur_freq
  cpufreq: blacklist mt8516 in cpufreq-dt-platdev
  cpufreq: mediatek: Add support for mt8167
  ...

cpufreq: Fix cpufreq_online() return value on errors

Make cpufreq_online() return negative error codes on all errors that
cause the policy to be destroyed, as appropriate.

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: Fix up several kerneldoc comments

Fix up the remaining kerneldoc comments that don't adhere to the
expected format and clarify some of them a bit.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: stats: Use local_clock() instead of jiffies

local_clock() has better precision and accuracy as compared to jiffies,
lets use it for time management in cpufreq stats.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: schedutil: Simplify sugov_update_next_freq()

Rearrange a conditional to make it more straightforward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()

Avoid doing the same assignment in both branches of a conditional,
do it after the whole conditional instead.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge back cpufreq material for v5.11.

Merge branch 'cpufreq/scmi' into cpufreq/arm/linux-next

cpufreq: arm_scmi: Discover the power scale in performance protocol

Add mechanism to discover the power scale present in the performance
protocol for all domains. Provide this information to Energy Model,
which then can be checked in other frameworks, e.g. thermal.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

firmware: arm_scmi: Add power_scale_mw_get() interface

Add a new interface to the existing perf_ops and export the information
about the power values scale.

This would be used by the cpufreq driver and Energy Model framework to
set the performance domains scale: milli-Watts or abstract scale.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: tegra194: Rename tegra194_get_speed_common function

The function tegra194_get_speed_common() uses hardware timers to
calculate the current CPUFREQ and so rename this function to be
tegra194_calculate_speed() to reflect what it does.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: tegra194: Remove unnecessary frequency calculation

The Tegra194 CPUFREQ driver sets the CPUFREQ_NEED_INITIAL_FREQ_CHECK
flag which means that the CPUFREQ framework will call the 'get' callback
on boot to determine the current frequency of the CPUs. Therefore, it is
not necessary for the Tegra194 CPUFREQ driver to internally call the
tegra194_get_speed_common() during initialisation to query the current
frequency as well. Fix this by removing the call to the
tegra194_get_speed_common() during initialisation and simplify the code.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: tegra186: Simplify cluster information lookup

The CPUFREQ driver framework references each individual CPUs when
getting and setting the speed. Tegra186 has 3 clusters of A57 CPUs and
1 cluster of Denver CPUs. Hence, the Tegra186 CPUFREQ driver need to
know which cluster a given CPU belongs to. The logic in the Tegra186
driver can be greatly simplified by storing the cluster ID associated
with each CPU in the tegra186_cpufreq_cpu structure. This allow us to
completely remove the Tegra cluster info structure from the driver and
simplifiy the code.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning

Sparse warns that the incorrect type is being assigned to the CPUFREQ
driver_data variable in the Tegra186 CPUFREQ driver. The Tegra186
CPUFREQ driver is assigned a type of 'void __iomem *' to a pointer of
type 'void *' ...

drivers/cpufreq/tegra186-cpufreq.c:72:37: sparse: sparse: incorrect
type in assignment (different address spaces) @@
expected void *driver_data @@ got void [noderef] __iomem * @@
...

drivers/cpufreq/tegra186-cpufreq.c:87:40: sparse: sparse: incorrect
type in initializer (different address spaces) @@
expected void [noderef] __iomem *edvd_reg @@ got void *driver_data @@

The Tegra186 CPUFREQ driver is using the policy->driver_data variable to
store and iomem pointer to a Tegra186 CPU register that is used to set
the clock speed for the CPU. This is not necessary because the register
base address is already stored in the driver data and the offset of the
register for each CPU is static. Therefore, fix this by adding a new
structure with the register offsets for each CPU and store this in the
main driver data structure along with the register base address. Please
note that a new structure has been added for storing the register
offsets rather than a simple array, because this will permit further
clean-ups and simplification of the driver.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: imx: fix NVMEM_IMX_OCOTP dependency

A driver should not 'select' drivers from another subsystem.
If NVMEM is disabled, this one results in a warning:

WARNING: unmet direct dependencies detected for NVMEM_IMX_OCOTP
  Depends on [n]: NVMEM [=n] && (ARCH_MXC [=y] || COMPILE_TEST [=y]) && HAS_IOMEM [=y]
  Selected by [y]:
  - ARM_IMX6Q_CPUFREQ [=y] && CPU_FREQ [=y] && (ARM || ARM64 [=y]) && ARCH_MXC [=y] && REGULATOR_ANATOP [=y]

Change the 'select' to 'depends on' to prevent it from going wrong,
and allow compile-testing without that driver, since it is only
a runtime dependency.

Fixes: 2782ef34ed23 ("cpufreq: imx: Select NVMEM_IMX_OCOTP")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: vexpress-spc: Add missing MODULE_ALIAS

This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq
driver when it is compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 47ac9aa165540 ("cpufreq: arm_big_little: add vexpress SPC interface driver")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: scpi: Add missing MODULE_ALIAS

This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq
driver when it is compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 8def31034d033 ("cpufreq: arm_big_little: add SCPI interface driver")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: loongson1: Add missing MODULE_ALIAS

This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq
driver when it is compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: a0a22cf14472f ("cpufreq: Loongson1: Add cpufreq driver for Loongson1B")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE

This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: f328584f7bff8 ("cpufreq: Add sun50i nvmem based CPU scaling driver")
Reviewed-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: st: Add missing MODULE_DEVICE_TABLE

This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: ab0ea257fc58d ("cpufreq: st: Provide runtime initialised driver for ST's platforms")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: qcom: Add missing MODULE_DEVICE_TABLE

This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 46e2856b8e188 ("cpufreq: Add Kryo CPU scaling driver")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE

This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 501c574f4e3a5 ("cpufreq: mediatek: Add support of cpufreq to MT2701/MT7623 SoC")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: highbank: Add missing MODULE_DEVICE_TABLE

This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 6754f556103be ("cpufreq / highbank: add support for highbank cpufreq")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: ap806: Add missing MODULE_DEVICE_TABLE

This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.

Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: f525a670533d9 ("cpufreq: ap806: add cpufreq driver for Armada 8K")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: mediatek: add missing platform_driver_unregister() on error in mtk_cpufreq_driver_init

Add the missing platform_driver_unregister() before return from
mtk_cpufreq_driver_init in the error handling case when failed
to register mtk-cpufreq platform device

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: tegra194: get consistent cpuinfo_cur_freq

Frequency returned by 'cpuinfo_cur_freq' using counters is not fixed
and keeps changing slightly. This change returns a consistent value
from freq_table. If the reconstructed frequency has acceptable delta
from the last written value, then return the frequency corresponding
to the last written ndiv value from freq_table. Otherwise, print a
warning and return the reconstructed freq.

Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: blacklist mt8516 in cpufreq-dt-platdev

Add MT8516 to cpufreq-dt-platdev blacklist since the actual scaling is
handled by the 'mediatek-cpufreq' driver.

Signed-off-by: Fabien Parent <fparent@baylibre.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: mediatek: Add support for mt8167

Add compatible string for mediatek mt8167

Signed-off-by: Fabien Parent <fparent@baylibre.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: sti-cpufreq: fix mem leak in sti_cpufreq_set_opp_info()

Use dev_pm_opp_put_prop_name() to avoid mem leak, which free opp_table.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Yangtao Li <frank@allwinnertech.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

ACPI: processor: Drop duplicate setting of shared_cpu_map

'shared_cpu_map', stored as part of the per-processor
acpi_processor_performance structre, is used to store CPUs that share
a performance domain. By definition it contains the owning CPU.

While building the 'shared_cpu_map' it is being set twice - once while
initialising the performance domains and again when matching CPUs
belonging to the same domain.

Drop the unnecessary initialisation.

Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull SCMI cpufreq driver fix for 5.10-rc6 from Viresh Kumar:

"This fixes a build issues with SCMI cpufreq driver in the
!CONFIG_COMMON_CLK case."

* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK

cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK

Commit 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a
dummy clock provider") registers a dummy clock provider using
devm_of_clk_add_hw_provider. These *_hw_provider functions are defined
only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig
dependency, but since we plan to move away from the clock dependency
for scmi cpufreq, it is preferrable to avoid that.

Let us just conditionally compile out the offending call to
devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside
of the #ifdef block to avoid build warning.

Fixes: 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider")
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Linux 5.10-rc5

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- Various functionality / regression fixes for Logitech devices from
   Hans de Goede

- Fix for (recently added) GPIO support in mcp2221 driver from Lars
   Povlsen

- Power management handling fix/quirk in i2c-hid driver for certain
   BIOSes that have strange aproach to power-cycle from Hans de Goede

- a few device ID additions and device-specific quirks

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver
  HID: logitech-dj: Fix an error in mse_bluetooth_descriptor
  HID: Add Logitech Dinovo Edge battery quirk
  HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge
  HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad
  HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices
  HID: mcp2221: Fix GPIO output handling
  HID: hid-sensor-hub: Fix issue with devices with no report ID
  HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown
  HID: add support for Sega Saturn
  HID: cypress: Support Varmilo Keyboards' media hotkeys
  HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses
  HID: logitech-hidpp: Add PID for MX Anywhere 2
  HID: uclogic: Add ID for Trust Flex Design Tablet

Merge tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:

   - Make the conditional update of the overutilized state work
     correctly by caching the relevant flags state before overwriting
     them and checking them afterwards.

   - Fix a data race in the wakeup path which caused loadavg on ARM64
     platforms to become a random number generator.

   - Fix the ordering of the iowaiter accounting operations so it can't
     be decremented before it is incremented.

   - Fix a bug in the deadline scheduler vs. priority inheritance when a
     non-deadline task A has inherited the parameters of a deadline task
     B and then blocks on a non-deadline task C.

     The second inheritance step used the static deadline parameters of
     task A, which are usually 0, instead of further propagating task
     B's parameters. The zero initialized parameters trigger a bug in
     the deadline scheduler"

* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix priority inheritance with multiple scheduling classes
  sched: Fix rq->nr_iowait ordering
  sched: Fix data-race in wakeup
  sched/fair: Fix overutilized update in enqueue_task_fair()

Merge tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
"A single fix for the x86 perf sysfs interfaces which used kobject
  attributes instead of device attributes and therefore making clang's
  control flow integrity checker upset"

* tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: fix sysfs type mismatches

Merge tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Thomas Gleixner:
"A single fix for lockdep which makes the recursion protection cover
graph lock/unlock"

* tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Put graph lock/unlock under lock_recursion protection

Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Borislav Petkov:
"Forwarded EFI fixes from Ard Biesheuvel:

   - fix memory leak in efivarfs driver

   - fix HYP mode issue in 32-bit ARM version of the EFI stub when built
     in Thumb2 mode

   - avoid leaking EFI pgd pages on allocation failure"

* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/x86: Free efi_pgd with free_pages()
  efivarfs: fix memory leak in efivarfs_create()
  efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP

Merge tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
   same because the proper one is going through the IOMMU tree (Thomas
   Gleixner)

- An Intel microcode loader fix to save the correct microcode patch to
   apply during resume (Chen Yu)

- A fix to not access user memory of other processes when dumping
   opcode bytes (Thomas Gleixner)

* tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
  x86/dumpstack: Do not try to access user space code of other tasks
  x86/microcode/intel: Check patch signature before saving microcode for early loading
  iommu/vt-d: Take CONFIG_PCI_ATS into account

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"8 patches.

  Subsystems affected by this patch series: mm (madvise, pagemap,
  readahead, memcg, userfaultfd), kbuild, and vfs"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: fix madvise WILLNEED performance problem
  libfs: fix error cast of negative value in simple_attr_write()
  mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
  mm: memcg/slab: fix root memcg vmstats
  mm: fix readahead_page_batch for retry entries
  mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
  compiler-clang: remove version check for BPF Tracing
  mm/madvise: fix memory leak from process_madvise

Merge tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO fixes from Greg KH:
"Here are some small Staging and IIO driver fixes for 5.10-rc5. They
  include:

   - IIO fixes for reported regressions and problems

   - new device ids for IIO drivers

   - new device id for rtl8723bs driver

   - staging ralink driver Kconfig dependency fix

   - staging mt7621-pci bus resource fix

  All of these have been in linux-next all week with no reported issues"

* tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
  iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
  docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI
  iio: light: fix kconfig dependency bug for VCNL4035
  iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
  iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
  staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
  staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK
  staging: mt7621-pci: avoid to request pci bus resources
  iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
  counter/ti-eqep: Fix regmap max_register
  iio: adc: stm32-adc: fix a regression when using dma and irq
  iio: adc: mediatek: fix unset field
  iio: cros_ec: Use default frequencies when EC returns invalid information

Merge tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty fixes from Greg KH:
"Here are some small tty/serial fixes for 5.10-rc5 that resolve some
  reported issues:

   - speakup crash when telling the kernel to use a device that isn't
     really there

   - imx serial driver fixes for reported problems

   - ar933x_uart driver fix for probe error handling path

  All have been in linux-next for a while with no reported issues"

* tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: ar933x_uart: disable clk on error handling path in probe
  tty: serial: imx: keep console clocks always on
  speakup: Do not let the line discipline be used several times
  tty: serial: imx: fix potential deadlock

Merge tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"A final set of miscellaneous bug fixes for ext4"

* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix bogus warning in ext4_update_dx_flag()
  jbd2: fix kernel-doc markups
  ext4: drop fast_commit from /proc/mounts

afs: Fix speculative status fetch going out of order wrt to modifications

When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.

To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).

When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.

A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes.  What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.

This results in something like the following being seen in dmesg:

kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification.  This was causing the cache to be
invalidated for that vnode when it shouldn't have been.  If it happens
on a data file, this might lead to local changes being lost.

Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.

Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway.  It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix madvise WILLNEED performance problem

The calculation of the end page index was incorrect, leading to a
regression of 70% when running stress-ng.

With this fix, we instead see a performance improvement of 3%.

Fixes: e6e88712e43b ("mm: optimise madvise WILLNEED")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

libfs: fix error cast of negative value in simple_attr_write()

The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion.  It will lead to the error cast if user inputs a
negative value.

Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value.  The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly.  Make 'val' unsigned long long as what kstrtoull() takes,
this will eliminate the compile warning on no 64-bit architectures.

Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()

Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.

In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases.  In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock.  Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.

For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm.  The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.

For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.

Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.

Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().

  BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
  Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334

  CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
  Hardware name: IBM 3906 M04 701 (KVM/Linux)
  Call Trace:
    do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
    create_huge_pmd mm/memory.c:4256 [inline]
    __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
    handle_mm_fault+0x288/0x748 mm/memory.c:4607
    do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
    do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
    pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
    copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
    raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
    _copy_from_user+0x48/0xa8 lib/usercopy.c:16
    copy_from_user include/linux/uaccess.h:192 [inline]
    __do_sys_sigaltstack kernel/signal.c:4064 [inline]
    __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
    system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

  Allocated by task 9334:
    slab_alloc_node mm/slub.c:2891 [inline]
    slab_alloc mm/slub.c:2899 [inline]
    kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
    vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
    __split_vma+0xba/0x560 mm/mmap.c:2742
    split_vma+0xca/0x108 mm/mmap.c:2800
    mlock_fixup+0x4ae/0x600 mm/mlock.c:550
    apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
    do_mlock+0x1aa/0x718 mm/mlock.c:711
    __do_sys_mlock2 mm/mlock.c:738 [inline]
    __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
    system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

  Freed by task 9333:
    slab_free mm/slub.c:3142 [inline]
    kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
    __vma_adjust+0x7b2/0x2508 mm/mmap.c:960
    vma_merge+0x87e/0xce0 mm/mmap.c:1209
    userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
    __fput+0x22c/0x7a8 fs/file_table.c:281
    task_work_run+0x200/0x320 kernel/task_work.c:151
    tracehook_notify_resume include/linux/tracehook.h:188 [inline]
    do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
    system_call+0xe6/0x28c arch/s390/kernel/entry.S:416

  The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200
  The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10)
  The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab)
  raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
  raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
  page dumped because: kasan: bad access detected
  page->mem_cgroup:0000000096959501

  Memory state around the buggy address:
   00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
  >00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                        ^
   00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
   00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ==================================================================

Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: memcg/slab: fix root memcg vmstats

If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for
the root memcg. Now this at least affects the vmstat of
NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack size is
smaller than the PAGE_SIZE.

David said:
"I assume that without this fix that the root memcg's vmstat would
always be inflated if we reparented"

Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: <stable@vger.kernel.org> [5.3+]
Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix readahead_page_batch for retry entries

Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for.  This was caused by a missing
check in the iterator.

As can be seen in the below panic log, the accessing 0x402 causes a
panic.  In the xarray.h, 0x402 means RETRY_ENTRY.

  BUG: kernel NULL pointer dereference, address: 0000000000000402
  CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1
  Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020
  RIP: 0010:fuse_readahead+0x152/0x470 [fuse]
  Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28 f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b 10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01
  RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246
  RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002
  RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60
  RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0
  R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68
  FS:  00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0
  Call Trace:
    read_pages+0x83/0x270
    page_cache_readahead_unbounded+0x197/0x230
    generic_file_buffered_read+0x57a/0xa20
    new_sync_read+0x112/0x1a0
    vfs_read+0xf8/0x180
    ksys_read+0x5f/0xe0
    do_syscall_64+0x33/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 042124cc64c3 ("mm: add new readahead_control API")
Reported-by: David Sterba <dsterba@suse.com>
Reported-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201103142852.8543-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201103124349.16722-1-vvghjk1234@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports

The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid().  That
symbol is exported for modules.  However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:

CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y

...and:

CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y

...it failed to export the symbol in the case of:

CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n

Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.

Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols.  Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.

The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h.  In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h.  An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.

The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.

[dan.j.williams@intel.com: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

compiler-clang: remove version check for BPF Tracing

bpftrace parses the kernel headers and uses Clang under the hood.

Remove the version check when __BPF_TRACING__ is defined (as bpftrace
does) so that this tool can continue to parse kernel headers, even with
older clang sources.

Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1")
Reported-by: Chen Yu <yu.chen.surf@gmail.com>
Reported-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lkml.kernel.org/r/20201104191052.390657-1-ndesaulniers@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/madvise: fix memory leak from process_madvise

The early return in process_madvise() will produce a memory leak.

Fix it.

Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201116155132.GA3805951@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
"The critical fixes are for a crash that someone reported in the xattr
  code on 32-bit arm last week; and a revert of the rmap key comparison
  change from last week as it was totally wrong. I need a vacation. :(

  Summary:

   - Fix various deficiencies in online fsck's metadata checking code

   - Fix an integer casting bug in the xattr code on 32-bit systems

   - Fix a hang in an inode walk when the inode index is corrupt

   - Fix error codes being dropped when initializing per-AG structures

   - Fix nowait directio writes that partially succeed but return EAGAIN

   - Revert last week's rmap comparison patch because it was wrong"

* tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: revert "xfs: fix rmap key and record comparison functions"
  xfs: don't allow NOWAIT DIO across extent boundaries
  xfs: return corresponding errcode if xfs_initialize_perag() fail
  xfs: ensure inobt record walks always make forward progress
  xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
  xfs: directory scrub should check the null bestfree entries too
  xfs: strengthen rmap record flags checking
  xfs: fix the minrecs logic when dealing with inode root child blocks

Merge tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fanotify fix from Jan Kara:
"A single fanotify fix from Amir"

* tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix logic of reporting name info with watched parent

Merge tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:
"This gets the seccomp selftests running again on powerpc and sh, and
  fixes an audit reporting oversight noticed in both seccomp and ptrace.

   - Fix typos in seccomp selftests on powerpc and sh (Kees Cook)

   - Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël
     Salaün)"

* tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: sh: Fix register names
  selftests/seccomp: powerpc: Fix typo in macro variable name
  seccomp: Set PF_SUPERPRIV when checking capability
  ptrace: Set PF_SUPERPRIV when checking capability

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Fixes for two fairly obscure but annoying when triggered races in
  iSCSI"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: iscsi: Fix cmd abort fabric stop race
  scsi: libiscsi: Fix NOP race condition

Merge tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request from Christoph:
      - Doorbell Buffer freeing fix (Minwoo Im)
      - CSE log leak fix (Keith Busch)

- blk-cgroup hd_struct leak fix (Christoph)

- Flush request state fix (Ming)

- dasd NULL deref fix (Stefan)

* tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
  s390/dasd: fix null pointer dereference for ERP requests
  blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats
  nvme: fix memory leak freeing command effects
  nvme: directly cache command effects log
  nvme: free sq/cq dbbuf pointers when dbbuf set fails
  block: mark flush request as IDLE when it is really finished

Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:

   - Disallow async path resolution of /proc/self

   - Tighten constraints for segmented async buffered reads

   - Fix double completion for a retry error case

   - Fix for fixed file life times (Pavel)"

* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
  io_uring: order refnode recycling
  io_uring: get an active ref_node from files_data
  io_uring: don't double complete failed reissue request
  mm: never attempt async page lock if we've transferred data already
  io_uring: handle -EOPNOTSUPP on path resolution
  proc: don't allow async path resolution of /proc/self components

selftests/seccomp: sh: Fix register names

It looks like the seccomp selftests was never actually built for sh.
This fixes it, though I don't have an environment to do a runtime test
of it yet.

Fixes: 0bb605c2c7f2b4b3 ("sh: Add SECCOMP_FILTER")
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/lkml/a36d7b48-6598-1642-e403-0c77a86f416d@physik.fu-berlin.de
Signed-off-by: Kees Cook <keescook@chromium.org>

selftests/seccomp: powerpc: Fix typo in macro variable name

A typo sneaked into the powerpc selftest. Fix the name so it builds again.

Fixes: 46138329faea ("selftests/seccomp: powerpc: Fix seccomp return value testing")
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/87y2ix2895.fsf@mpe.ellerman.id.au
Signed-off-by: Kees Cook <keescook@chromium.org>

Merge tag 'for-linus-5.10b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"A single fix for avoiding WARN splats when booting a Xen guest with
nosmt"

* tag 'for-linus-5.10b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: don't unbind uninitialized lock_kicker_irq

Merge tag 'dmaengine-fix-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
"A solitary core fix and a few driver fixes:

  Core:

   - channel_register error handling

  Driver fixes:

   - idxd: wq config registers programming and mapping of portal size

   - ioatdma: unused fn removal

   - pl330: fix burst size

   - ti: pm fix on busy and -Wenum-conversion warns

   - xilinx: SG capability check, usage of xilinx_aximcdma_tx_segment,
     readl_poll_timeout_atomic variant"

* tag 'dmaengine-fix-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: fix error codes in channel_register()
  dmaengine: pl330: _prep_dma_memcpy: Fix wrong burst size
  dmaengine: ioatdma: remove unused function missed during dma_v2 removal
  dmaengine: idxd: fix mapping of portal size
  dmaengine: ti: omap-dma: Block PM if SDMA is busy to fix audio
  dmaengine: xilinx_dma: Fix SG capability check for MCDMA
  dmaengine: xilinx_dma: Fix usage of xilinx_aximcdma_tx_segment
  dmaengine: xilinx_dma: use readl_poll_timeout_atomic variant
  dmaengine: ti: k3-udma: fix -Wenum-conversion warning
  dmaengine: idxd: fix wq config registers offset programming

Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull iommu fixes from Will Deacon:
"Two straightforward vt-d fixes:

   - Fix boot when intel iommu initialisation fails under TXT (tboot)

   - Fix intel iommu compilation error when DMAR is enabled without ATS

  and temporarily update IOMMU MAINTAINERs entry"

* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  MAINTAINERS: Temporarily add myself to the IOMMU entry
  iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set
  iommu/vt-d: Avoid panic if iommu init fails in tboot system

Merge tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:

   - sdhci-of-arasan: Stabilize communication by fixing tap value configs

   - sdhci-pci: Use SDR25 timing for HS mode for BYT-based Intel HWs"

* tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-of-arasan: Issue DLL reset explicitly
  mmc: sdhci-of-arasan: Use Mask writes for Tap delays
  mmc: sdhci-of-arasan: Allow configuring zero tap values
  mmc: sdhci-pci: Prefer SDR25 timing for High Speed mode for BYT-based Intel controllers

Merge tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes: the only core change is a minor error
  code handling in the control API, and all the rest are device-specific
  fixes, mostly quirks, fixups and ASoC Intel fixes.

  It looks boring, and good so"

* tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: mixart: Fix mutex deadlock
  ALSA: hda/ca0132: Fix compile warning without PCI
  ASOC: Intel: kbl_rt5663_rt5514_max98927: Do not try to disable disabled clock
  ALSA: usb-audio: Add delay quirk for all Logitech USB devices
  ASoC: Intel: catpt: Correct clock selection for dai trigger
  ASoC: Intel: catpt: Skip position update for unprepared streams
  ASoC: qcom: lpass-platform: Fix memory leak
  ASoC: Intel: KMB: Fix S24_LE configuration
  ALSA: hda: Add Alderlake-S PCI ID and HDMI codec vid
  ALSA: usb-audio: Use ALC1220-VB-DT mapping for ASUS ROG Strix TRX40 mobo
  ALSA: firewire: Clean up a locking issue in copy_resp_to_buf()
  ASoC: rt1015: increase the time to detect BCLK
  ALSA: ctl: fix error path at adding user-defined element set
  ALSA: hda/realtek - HP Headset Mic can't detect after boot
  ALSA: hda/realtek - Add supported mute Led for HP
  ALSA: hda/realtek: Add some Clove SSID in the ALC293(ALC1220)
  ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button
  ASoC: rt1015: add delay to fix pop noise from speaker

Merge tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Weekly fixes pull.

  This contains some fixes for sun4i/dw-hdmi probing, then amdgpu
  enables arcturus hw without experimental flag and two other fixes and
  a group of i915 fixes.

  It also has a backported from next fix for the warn on reported in
  ast/drm_gem_vram_helper code in the merge window. There's a separate
  report which initially looked to be the same problem, but I'm going to
  chase that up next week a bit more as I don't think the bisect landed
  anywhere useful.

  Summary:

  core:
   - vram helper TTM regression fix

  amdgpu:
   - Pageflip fix for navi1x with 5 or 6 displays
   - Remove experimental flag for Arcturus
   - Fix regression in atomic commit tail rework

  i915:
   - Fix tgl power gating issue
   - Memory leak fixes
   - Selftest fixes
   - Display bpc fix
   - Fix TGL MOCS for PTE tracking

  dw-hdmi:
   - probing fix

  sun4i:
   - probing fix"

* tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/gt: Fixup tgl mocs for PTE tracking
  drm/vram-helper: Fix use of top-down placement
  drm/i915/gt: Remember to free the virtual breadcrumbs
  drm/i915: Handle max_bpc==16
  drm/amd/display: Always get CRTC updated constant values inside commit tail
  drm/sun4i: backend: Fix probe failure with multiple backends
  drm/sun4i: dw-hdmi: fix error return code in sun8i_dw_hdmi_bind()
  drm/i915/selftests: Fix wrong return value of perf_request_latency()
  drm/i915/selftests: Fix wrong return value of perf_series_engines()
  drm/i915: Avoid memory leak with more than 16 workarounds on a list
  drm/i915/tgl: Fix Media power gate sequence.
  drm/amdgpu: remove experimental flag from arcturus
  drm/amd/display: Add missing pflip irq for dcn2.0
  drm/i915/gvt: return error when failing to take the module reference
  drm: bridge: dw-hdmi: Avoid resetting force in the detect function
  drm/i915/gvt: Set ENHANCED_FRAME_CAP bit
  drm/i915/gvt: Temporarily disable vfio_edid for BXT/APL

ext4: fix bogus warning in ext4_update_dx_flag()

The idea of the warning in ext4_update_dx_flag() is that we should warn
when we are clearing EXT4_INODE_INDEX on a filesystem with metadata
checksums enabled since after clearing the flag, checksums for internal
htree nodes will become invalid. So there's no need to warn (or actually
do anything) when EXT4_INODE_INDEX is not set.

Link: https://lore.kernel.org/r/20201118153032.17281-1-jack@suse.cz
Fixes: 48a34311953d ("ext4: fix checksum errors with indexed dirs")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

jbd2: fix kernel-doc markups

Kernel-doc markup should use this format:
identifier - description

They should not have any type before that, as otherwise
the parser won't do the right thing.

Also, some identifiers have different names between their
prototypes and the kernel-doc markup.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge tag 'drm-intel-fixes-2020-11-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix tgl power gating issue (Rodrigo)
- Memory leak fixes (Tvrtko, Chris)
- Selftest fixes (Zhang)
- Display bpc fix (Ville)
- Fix TGL MOCS for PTE tracking (Chris)

GVT Fixes: It temporarily disables VFIO edid
feature on BXT/APL until its virtual display is really fixed to make
it work properly. And fixes for DPCD 1.2 and error return in taking
module reference.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119203417.GA1795798@intel.com

Merge tag 'drm-misc-fixes-2020-11-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

two patches to fix dw-hdmi bind and detection code, and one fix for
sun4i shared with arm-soc

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119083939.ddj3saipyg5iwvb4@gilmour

xfs: revert "xfs: fix rmap key and record comparison functions"

This reverts commit 6ff646b2ceb0eec916101877f38da0b73e3a5b7f.

Your maintainer committed a major braino in the rmap code by adding the
attr fork, bmbt, and unwritten extent usage bits into rmap record key
comparisons. While XFS uses the usage bits *in the rmap records* for
cross-referencing metadata in xfs_scrub and xfs_repair, it only needs
the owner and offset information to distinguish between reverse mappings
of the same physical extent into the data fork of a file at multiple
offsets. The other bits are not important for key comparisons for index
lookups, and never have been.

Eric Sandeen reports that this causes regressions in generic/299, so
undo this patch before it does more damage.

Reported-by: Eric Sandeen <sandeen@sandeen.net>
Fixes: 6ff646b2ceb0 ("xfs: fix rmap key and record comparison functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>

Merge tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc5, including fixes from the WiFi
  (mac80211), can and bpf (including the strncpy_from_user fix).

  Current release - regressions:

   - mac80211: fix memory leak of filtered powersave frames

   - mac80211: free sta in sta_info_insert_finish() on errors to avoid
     sleeping in atomic context

   - netlabel: fix an uninitialized variable warning added in -rc4

  Previous release - regressions:

   - vsock: forward all packets to the host when no H2G is registered,
     un-breaking AWS Nitro Enclaves

   - net: Exempt multicast addresses from five-second neighbor lifetime
     requirement, decreasing the chances neighbor tables fill up

   - net/tls: fix corrupted data in recvmsg

   - qed: fix ILT configuration of SRC block

   - can: m_can: process interrupt only when not runtime suspended

  Previous release - always broken:

   - page_frag: Recover from memory pressure by not recycling pages
     allocating from the reserves

   - strncpy_from_user: Mask out bytes after NUL terminator

   - ip_tunnels: Set tunnel option flag only when tunnel metadata is
     present, always setting it confuses Open vSwitch

   - bpf, sockmap:
      - Fix partial copy_page_to_iter so progress can still be made
      - Fix socket memory accounting and obeying SO_RCVBUF

   - net: Have netpoll bring-up DSA management interface

   - net: bridge: add missing counters to ndo_get_stats64 callback

   - tcp: brr: only postpone PROBE_RTT if RTT is < current min_rtt

   - enetc: Workaround MDIO register access HW bug

   - net/ncsi: move netlink family registration to a subsystem init,
     instead of tying it to driver probe

   - net: ftgmac100: unregister NC-SI when removing driver to avoid
     crash

   - lan743x:
      - prevent interrupt storm on open
      - fix freeing skbs in the wrong context

   - net/mlx5e: Fix socket refcount leak on kTLS RX resync

   - net: dsa: mv88e6xxx: Avoid VLAN database corruption on 6097

   - fix 21 unset return codes and other mistakes on error paths, mostly
     detected by the Hulk Robot"

* tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (115 commits)
  fail_function: Remove a redundant mutex unlock
  selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
  lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
  net/smc: fix direct access to ib_gid_addr->ndev in smc_ib_determine_gid()
  net/smc: fix matching of existing link groups
  ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module
  libbpf: Fix VERSIONED_SYM_COUNT number parsing
  net/mlx4_core: Fix init_hca fields offset
  atm: nicstar: Unmap DMA on send error
  page_frag: Recover from memory pressure
  net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset
  mlxsw: core: Use variable timeout for EMAD retries
  mlxsw: Fix firmware flashing
  net: Have netpoll bring-up DSA management interface
  atl1e: fix error return code in atl1e_probe()
  atl1c: fix error return code in atl1c_probe()
  ah6: fix error return code in ah6_input()
  net: usb: qmi_wwan: Set DTR quirk for MR400
  can: m_can: process interrupt only when not runtime suspended
  can: flexcan: flexcan_chip_start(): fix erroneous flexcan_transceiver_enable() during bus-off recovery
  ...

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"The last two weeks have been quiet here, just the usual smattering of
  long standing bug fixes.

  A collection of error case bug fixes:

   - Improper nesting of spinlock types in cm

   - Missing error codes and kfree()

   - Ensure dma_virt_ops users have the right kconfig symbols to work
     properly

   - Compilation failure of tools/testing"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  tools/testing/scatterlist: Fix test to compile and run
  IB/hfi1: Fix error return code in hfi1_init_dd()
  RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs
  RDMA/pvrdma: Fix missing kfree() in pvrdma_register_device()
  RDMA/cm: Make the local_id_table xarray non-irq

ext4: drop fast_commit from /proc/mounts

The options in /proc/mounts must be valid mount options --- and
fast_commit is not a mount option.  Otherwise, command sequences like
this will fail:

    # mount /dev/vdc /vdc
    # mkdir -p /vdc/phoronix_test_suite /pts
    # mount --bind /vdc/phoronix_test_suite /pts
    # mount -o remount,nodioread_nolock /pts
    mount: /pts: mount point not mounted or bad option.

And in the system logs, you'll find:

    EXT4-fs (vdc): Unrecognized mount option "fast_commit" or missing value

Fixes: 995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
1) libbpf should not attempt to load unused subprogs, from Andrii.

2) Make strncpy_from_user() mask out bytes after NUL terminator, from Daniel.

3) Relax return code check for subprograms in the BPF verifier, from Dmitrii.

4) Fix several sockmap issues, from John.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  fail_function: Remove a redundant mutex unlock
  selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
  lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
  libbpf: Fix VERSIONED_SYM_COUNT number parsing
  bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list
  bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self
  bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self
  bpf, sockmap: Use truesize with sk_rmem_schedule()
  bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect
  bpf, sockmap: Fix partial copy_page_to_iter so progress can still be made
  selftests/bpf: Fix error return code in run_getsockopt_test()
  bpf: Relax return code check for subprograms
  tools, bpftool: Add missing close before bpftool net attach exit
  MAINTAINERS/bpf: Update Andrii's entry.
  selftests/bpf: Fix unused attribute usage in subprogs_unused test
  bpf: Fix unsigned 'datasec_id' compared with zero in check_pseudo_btf_id
  bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepare
  libbpf: Don't attempt to load unused subprog as an entry-point BPF program
====================

Link: https://lore.kernel.org/r/20201119200721.288-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drm/i915/gt: Fixup tgl mocs for PTE tracking

Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
caused display glitches. Though it is documented as deprecated (and so
likely behaves as uncached) use the follow-pte bit and force it out of
L3 cache.

Testcase: igt/kms_frontbuffer_tracking
Testcase: igt/kms_big_fb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
(cherry picked from commit a04ac827366594c7244f60e9be79fcb404af69f0)
Fixes: 849c0fe9e831 ("drm/i915/gt: Initialize reserved and unspecified MOCS indices")
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Updated Fixes tag]

Merge tag 'amd-drm-fixes-5.10-2020-11-18' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.10-2020-11-18:

amdgpu:
- Pageflip fix for navi1x with 5 or 6 displays
- Remove experimental flag for Arcturus
- Fix regression in atomic commit tail rework

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201118213646.4015-1-alexander.deucher@amd.com

fail_function: Remove a redundant mutex unlock

Fix a mutex_unlock() issue where before copy_from_user() is
not called mutex_locked.

Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/160570737118.263807.8358435412898356284.stgit@devnote2

Merge branch 'Fix bpf_probe_read_user_str() overcopying'

Daniel Xu says:

====================

6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
kernel}_str helpers") introduced a subtle bug where
bpf_probe_read_user_str() would potentially copy a few extra bytes after
the NUL terminator.

This issue is particularly nefarious when strings are used as map keys,
as seemingly identical strings can occupy multiple entries in a map.

This patchset fixes the issue and introduces a selftest to prevent
future regressions.

v6 -> v7:
* Add comments

v5 -> v6:
* zero-pad up to sizeof(unsigned long) after NUL

v4 -> v5:
* don't read potentially uninitialized memory

v3 -> v4:
* directly pass userspace pointer to prog
* test more strings of different length

v2 -> v3:
* set pid filter before attaching prog in selftest
* use long instead of int as bpf_probe_read_user_str() retval
* style changes

v1 -> v2:
* add Fixes: tag
* add selftest
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL

Previously, bpf_probe_read_user_str() could potentially overcopy the
trailing bytes after the NUL due to how do_strncpy_from_user() does the
copy in long-sized strides. The issue has been fixed in the previous
commit.

This commit adds a selftest that ensures we don't regress
bpf_probe_read_user_str() again.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/4d977508fab4ec5b7b574b85bdf8b398868b6ee9.1605642949.git.dxu@dxuuu.xyz

lib/strncpy_from_user.c: Mask out bytes after NUL terminator.

do_strncpy_from_user() may copy some extra bytes after the NUL
terminator into the destination buffer. This usually does not matter for
normal string operations. However, when BPF programs key BPF maps with
strings, this matters a lot.

A BPF program may read strings from user memory by calling the
bpf_probe_read_user_str() helper which eventually calls
do_strncpy_from_user(). The program can then key a map with the
destination buffer. BPF map keys are fixed-width and string-agnostic,
meaning that map keys are treated as a set of bytes.

The issue is when do_strncpy_from_user() overcopies bytes after the NUL
terminator, it can result in seemingly identical strings occupying
multiple slots in a BPF map. This behavior is subtle and totally
unexpected by the user.

This commit masks out the bytes following the NUL while preserving
long-sized stride in the fast path.

Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/21efc982b3e9f2f7b0379eed642294caaa0c27a7.1605642949.git.dxu@dxuuu.xyz

Merge tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"Fixes for CVE-2020-4788.

  From Daniel's cover letter:

  IBM Power9 processors can speculatively operate on data in the L1
  cache before it has been completely validated, via a way-prediction
  mechanism. It is not possible for an attacker to determine the
  contents of impermissible memory using this method, since these
  systems implement a combination of hardware and software security
  measures to prevent scenarios where protected data could be leaked.

  However these measures don't address the scenario where an attacker
  induces the operating system to speculatively execute instructions
  using data that the attacker controls. This can be used for example to
  speculatively bypass "kernel user access prevention" techniques, as
  discovered by Anthony Steinhauser of Google's Safeside Project. This
  is not an attack by itself, but there is a possibility it could be
  used in conjunction with side-channels or other weaknesses in the
  privileged code to construct an attack.

  This issue can be mitigated by flushing the L1 cache between privilege
  boundaries of concern.

  This patch series flushes the L1 cache on kernel entry (patch 2) and
  after the kernel performs any user accesses (patch 3). It also adds a
  self-test and performs some related cleanups"

* tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations
  selftests/powerpc: refactor entry and rfi_flush tests
  selftests/powerpc: entry flush test
  powerpc: Only include kup-radix.h for 64-bit Book3S
  powerpc/64s: flush L1D after user accesses
  powerpc/64s: flush L1D on kernel entry
  selftests/powerpc: rfi_flush: disable entry flush if present

Merge tag 'xtensa-20201119' of git://github.com/jcmvbkbc/linux-xtensa

Pull xtensa fixes from Max Filippov:

- fix placement of cache alias remapping area

- disable preemption around cache alias management calls

- add missing __user annotation to strncpy_from_user argument

* tag 'xtensa-20201119' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: uaccess: Add missing __user to strncpy_from_user() prototype
  xtensa: disable preemption around cache alias management calls
  xtensa: fix TLBTEMP area placement

drm/vram-helper: Fix use of top-down placement

Commit 7053e0eab473 ("drm/vram-helper: stop using TTM placement flags")
cleared the BO placement flags if top-down placement had been selected.
Hence, BOs that were supposed to go into VRAM are now placed in a default
location in system memory.

Trying to scanout the incorrectly pinned BO results in displayed garbage
and an error message.

[  146.108127] ------------[ cut here ]------------
[  146.1V08180] WARNING: CPU: 0 PID: 152 at drivers/gpu/drm/drm_gem_vram_helper.c:284 drm_gem_vram_offset+0x59/0x60 [drm_vram_helper]
...
[  146.108591]  ast_cursor_page_flip+0x3e/0x150 [ast]
[  146.108622]  ast_cursor_plane_helper_atomic_update+0x8a/0xc0 [ast]
[  146.108654]  drm_atomic_helper_commit_planes+0x197/0x4c0
[  146.108699]  drm_atomic_helper_commit_tail_rpm+0x59/0xa0
[  146.108718]  commit_tail+0x103/0x1c0
...
[  146.109302] ---[ end trace d901a1ba1d949036 ]---

Fix the bug by keeping the placement flags. The top-down placement flag
is stored in a separate variable.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes: 7053e0eab473 ("drm/vram-helper: stop using TTM placement flags")
Reported-by: Pu Wen <puwen@hygon.cn> [for 5.10-rc1]
Tested-by: Pu Wen <puwen@hygon.cn>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200921142536.4392-1-tzimmermann@suse.de
(cherry picked from commit b8f8dbf6495850b0babc551377bde754b7bc0eea)
[pulled into fixes from drm-next]
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge tag 'acpi-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix recent regression in the APEI code and initialization issue
  in the ACPI fan driver.

  Specifics:

   - Make the APEI code avoid attempts to obtain logical addresses for
     registers located in the I/O address space to fix initialization
     issues (Aili Yao)

   - Fix sysfs attribute initialization in the ACPI fan driver (Guenter
     Roeck)"

* tag 'acpi-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI, APEI, Fix error return value in apei_map_generic_address()
  ACPI: fan: Initialize performance state sysfs attribute

Merge tag 'pm-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix two issues in ARM cpufreq drivers and one cpuidle driver
  issue.

  Specifics:

   - Add missing RCU_NONIDLE() annotations to the Tegra cpuidle driver
     (Dmitry Osipenko)

   - Fix boot frequency computation in the tegra186 cpufreq driver (Jon
     Hunter)

   - Make the SCMI cpufreq driver register a dummy clock provider to
     avoid OPP addition failures (Sudeep Holla)"

* tag 'pm-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: scmi: Fix OPP addition failure with a dummy clock provider
  cpufreq: tegra186: Fix get frequency callback
  cpuidle: tegra: Annotate tegra_pm_set_cpu_in_lp2() with RCU_NONIDLE

Merge tag 'spi-fix-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"This is a relatively large set of fixes, the bulk of it being a series
  from Lukas Wunner which fixes confusion with the lifetime of driver
  data allocated along with the SPI controller structure that's been
  created as part of the conversion to devm APIs.

  The simplest fix, explained in detail in Lukas' commit message, is to
  move to a devm_ function for allocation of the controller and hence
  driver data in order to push the free of that after anything tries to
  reference the driver data in the remove path. This results in a
  relatively large diff due to the addition of a new function but isn't
  particularly complex.

  There's also a fix from Sven van Asbroeck which fixes yet more fallout
  from the conflicts between the various different places one can
  configure the polarity of GPIOs in modern systems.

  Otherwise everything is fairly small and driver specific"

* tag 'spi-fix-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: npcm-fiu: Don't leak SPI master in probe error path
  spi: dw: Set transfer handler before unmasking the IRQs
  spi: cadence-quadspi: Fix error return code in cqspi_probe
  spi: bcm2835aux: Restore err assignment in bcm2835aux_spi_probe
  spi: lpspi: Fix use-after-free on unbind
  spi: bcm-qspi: Fix use-after-free on unbind
  spi: bcm2835aux: Fix use-after-free on unbind
  spi: bcm2835: Fix use-after-free on unbind
  spi: Introduce device-managed SPI controller allocation
  spi: fsi: Fix transfer returning without finalizing message
  spi: fix client driver breakages when using GPIO descriptors

Merge branch 'net-smc-fixes-2020-11-18'

Karsten Graul says:

====================
net/smc: fixes 2020-11-18

Patch 1 fixes the matching of link groups because with SMC-Dv2 the vlanid
should no longer be part of this matching. Patch 2 removes a sparse message.
====================

Link: https://lore.kernel.org/r/20201118214038.24039-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/smc: fix direct access to ib_gid_addr->ndev in smc_ib_determine_gid()

Sparse complaints 3 times about:
net/smc/smc_ib.c:203:52: warning: incorrect type in argument 1 (different address spaces)
net/smc/smc_ib.c:203:52: expected struct net_device const *dev
net/smc/smc_ib.c:203:52: got struct net_device [noderef] __rcu *const ndev

Fix that by using the existing and validated ndev variable instead of
accessing attr->ndev directly.

Fixes: 5102eca9039b ("net/smc: Use rdma_read_gid_l2_fields to L2 fields")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/smc: fix matching of existing link groups

With the multi-subnet support of SMC-Dv2 the match for existing link
groups should not include the vlanid of the network device.
Set ini->smcd_version accordingly before the call to smc_conn_create()
and use this value in smc_conn_create() to skip the vlanid check.

Fixes: 5c21c4ccafe8 ("net/smc: determine accepted ISM devices")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>