Currently, 'pcc_chan_count' is remains set to a non-zero value if PCC
subspaces are parsed successfully but something else fail later during
the initial PCC probing phase. This will result in pcc_mbox_request_channel
trying to access the resources that are not initialised or allocated and
may end up in a system crash.
Reset pcc_chan_count to 0 when the PCC probe fails in order to prevent
the possible issue as described above.
Fixes: ce028702ddbc ("mailbox: pcc: Move bulk of PCCT parsing into pcc_mbox_probe") Signed-off-by: Huisong Li <lihuisong@huawei.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit 1fa5ae857bb1 ("driver core: get rid of struct device's
bus_id string array"), the name of device is allocated dynamically,
move dev_set_name() after pnp_add_id() to avoid memory leak.
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
dev_set_name() allocates memory for name, it need be freed
when module exiting, call put_device() to give up reference,
so that it can be freed in kobject_cleanup() when the refcount
hit to 0. The vpe_device is static, so remove kfree() from
vpe_device_release().
Fixes: 17a1d523aa58 ("MIPS: APRP: Add VPE loader support for CMP platforms.") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Afer commit 1fa5ae857bb1 ("driver core: get rid of struct device's
bus_id string array"), the name of device is allocated dynamically,
it need be freed when module exiting, call put_device() to give up
reference, so that it can be freed in kobject_cleanup() when the
refcount hit to 0. The vpe_device is static, so remove kfree() from
vpe_device_release().
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The cpufreq_driver->get() callback is supposed to return the current
frequency of the CPU and not the one requested by the CPUFreq core.
Fix it by returning the frequency that gets supplied to the CPU after
the DCVS operation of EPSS/OSM.
Link: https://lkml.kernel.org/r/41651ca1-432a-db34-eb97-d35744559de1@linux.alibaba.com Fixes: 3878f110f71a ("ocfs2: Move the hb_ctl_path sysctl into the stack glue.") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When `timerqueue_getnext()` is called on an empty timer queue, it will
use `rb_entry()` on a NULL pointer, which is invalid. Fix that by using
`rb_entry_safe()` which handles NULL pointers.
This has not caused any issues so far because the offset of the `rb_node`
member in `timerqueue_node` is 0, so `rb_entry()` is essentially a no-op.
Fixes: 511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next timer") Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221114195421.342929-1-pobrn@protonmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously, `huawei_wmi_input_setup()` returned the result of
logical or-ing the return values of two functions that return negative
errno-style error codes and one that returns `acpi_status`. If this
returned value was non-zero, then it was propagated from the platform
driver's probe function. That function should return a negative
errno-style error code, so the result of the logical or that
`huawei_wmi_input_setup()` returned was not appropriate.
Fix that by checking each function separately and returning the
error code unmodified.
Fixes: 1ac9abeb2e5b ("platform/x86: huawei-wmi: Move to platform driver") Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com> Link: https://lore.kernel.org/r/20221005150032.173198-2-pobrn@protonmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1. Var debug_objects_allocated tracks valid kmem_cache_alloc calls, so
track it in debug_objects_replace_static_objects. Do similar things in
object_cpu_offline.
2. In debug_objects_mem_init, there is no need to call function
cpuhp_setup_state_nocalls when debug_objects_enabled = 0 (out of
memory).
Link: https://lkml.kernel.org/r/20220611130634.99741-1-wuchi.zero@gmail.com Fixes: 634d61f45d6f ("debugobjects: Percpu pool lookahead freeing/allocation") Fixes: c4b73aabd098 ("debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done") Signed-off-by: wuchi <wuchi.zero@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In pmu_dev_alloc(), when dev_set_name() failed, it will goto free_dev
and call put_device(pmu->dev) to release it.
However pmu->dev->release is assigned after this, which makes warning
and memleak.
Call dev_set_name() after pmu->dev->release = pmu_dev_release to fix it.
Device '(null)' does not have a release() function...
WARNING: CPU: 2 PID: 441 at drivers/base/core.c:2332 device_release+0x1b9/0x240
...
Call Trace:
<TASK>
kobject_put+0x17f/0x460
put_device+0x20/0x30
pmu_dev_alloc+0x152/0x400
perf_pmu_register+0x96b/0xee0
...
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
unreferenced object 0xffff888014759000 (size 2048):
comm "modprobe", pid 441, jiffies 4294931444 (age 38.332s)
backtrace:
[<0000000005aed3b4>] kmalloc_trace+0x27/0x110
[<000000006b38f9b8>] pmu_dev_alloc+0x50/0x400
[<00000000735f17be>] perf_pmu_register+0x96b/0xee0
[<00000000e38477f1>] 0xffffffffc0ad8603
[<000000004e162216>] do_one_initcall+0xd0/0x4e0
...
The following commit change the second parameter of acpi_set_irq_model()
but forgot to update the function description. Let's fix it.
commit 7327b16f5f56 ("APCI: irq: Add support for multiple GSI domains")
Also add description of parameter 'gsi' for
acpi_get_irq_source_fwhandle() to avoid the following build W=1 warning.
drivers/acpi/irq.c:108: warning: Function parameter or member 'gsi' not described in 'acpi_get_irq_source_fwhandle'
Fixes: 7327b16f5f56 ("APCI: irq: Add support for multiple GSI domains") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, it not only shows the warn message, but also intentionally
introduces a slowdown through sleeping plus serialization mechanism
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.
This slowdown is enough to totally break some proprietary (aka.
unfixable) userspace[1].
Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slows down the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While it is understandable that a malicious application could DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].
Add a sysctl to allow controlling the "misery mode" behavior, as per
Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observing the warning messages on kernel log.
[ dhansen: minor changelog tweaks, including clarifying the actual
problem ]
Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Andre Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The check being unconditional may lead to unwanted denials reported by
LSMs when a process has the capability granted by DAC, but denied by an
LSM. In the case of SELinux such denials are a problem, since they can't
be effectively filtered out via the policy and when not silenced, they
produce noise that may hide a true problem or an attack.
Checking for the capability only if any trusted xattr is actually
present wouldn't really address the issue, since calling listxattr(2) on
such node on its own doesn't indicate an explicit attempt to see the
trusted xattrs. Additionally, it could potentially leak the presence of
trusted xattrs to an unprivileged user if they can check for the denials
(e.g. through dmesg).
Therefore, it's best (and simplest) to keep the check unconditional and
instead use ns_capable_noaudit() that will silence any associated LSM
denials.
Fixes: 38f38657444d ("xattr: extract simple_xattr code from tmpfs") Reported-by: Martin Pitt <mpitt@redhat.com> Suggested-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
but the one in the kerneldoc comment of the function is different and
incorrect.
Fixes: ddeb64870810 ("PM / Hibernate: Add sysfs knob to control size of memory for drivers") Signed-off-by: xiongxin <xiongxin@kylinos.cn>
[ rjw: Subject and changelog rewrite ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
releasing large enclaves") introduced a cond_resched() during enclave
release where the EREMOVE instruction is applied to every 4k enclave
page. Giving other tasks an opportunity to run while tearing down a
large enclave placates the soft lockup detector but Iqbal found
that the fix causes a 25% performance degradation of a workload
run using Gramine.
Gramine maintains a 1:1 mapping between processes and SGX enclaves.
That means if a workload in an enclave creates a subprocess then
Gramine creates a duplicate enclave for that subprocess to run in.
The consequence is that the release of the enclave used to run
the subprocess can impact the performance of the workload that is
run in the original enclave, especially in large enclaves when
SGX2 is not in use.
The workload run by Iqbal behaves as follows:
Create enclave (enclave "A")
/* Initialize workload in enclave "A" */
Create enclave (enclave "B")
/* Run subprocess in enclave "B" and send result to enclave "A" */
Release enclave (enclave "B")
/* Run workload in enclave "A" */
Release enclave (enclave "A")
The performance impact of releasing enclave "B" in the above scenario
is amplified when there is a lot of SGX memory and the enclave size
matches the SGX memory. When there is 128GB SGX memory and an enclave
size of 128GB, from the time enclave "B" starts the 128GB SGX memory
is oversubscribed with a combined demand for 256GB from the two
enclaves.
Before commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
releasing large enclaves") enclave release was done in a tight loop
without giving other tasks a chance to run. Even though the system
experienced soft lockups the workload (run in enclave "A") obtained
good performance numbers because when the workload started running
there was no interference.
Commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
releasing large enclaves") gave other tasks opportunity to run while an
enclave is released. The impact of this in this scenario is that while
enclave "B" is released and needing to access each page that belongs
to it in order to run the SGX EREMOVE instruction on it, enclave "A"
is attempting to run the workload needing to access the enclave
pages that belong to it. This causes a lot of swapping due to the
demand for the oversubscribed SGX memory. Longer latencies are
experienced by the workload in enclave "A" while enclave "B" is
released.
Improve the performance of enclave release while still avoiding the
soft lockup detector with two enhancements:
- Only call cond_resched() after XA_CHECK_SCHED iterations.
- Use the xarray advanced API to keep the xarray locked for
XA_CHECK_SCHED iterations instead of locking and unlocking
at every iteration.
This batching solution is copied from sgx_encl_may_map() that
also iterates through all enclave pages using this technique.
With this enhancement the workload experiences a 5%
performance degradation when compared to a kernel without
commit 8795359e35bc ("x86/sgx: Silence softlockup detection when
releasing large enclaves"), an improvement to the reported 25%
degradation, while still placating the soft lockup detector.
Scenarios with poor performance are still possible even with these
enhancements. For example, short workloads creating sub processes
while running in large enclaves. Further performance improvements
are pursued in user space through avoiding to create duplicate enclaves
for certain sub processes, and using SGX2 that will do lazy allocation
of pages as needed so enclaves created for sub processes start quickly
and release quickly.
When a pending event exists and growth is less than the threshold, the
current logic is to skip this trigger without generating event. However,
from e6df4ead85d9 ("psi: fix possible trigger missing in the window"),
our purpose is to generate event as long as pending event exists and the
rate meets the limit, no matter what growth is.
This patch handles this case properly.
Fixes: e6df4ead85d9 ("psi: fix possible trigger missing in the window") Signed-off-by: Hao Lee <haolee.swjtu@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/r/20220919072356.GA29069@haolee.io Signed-off-by: Sasha Levin <sashal@kernel.org>
We only want to take the slow path if SYSCALL_TRACE or SYSCALL_AUDIT is
set; on !AUDIT_SYSCALL configs the current tree hits it whenever _any_
thread flag (including NEED_RESCHED, NOTIFY_SIGNAL, etc.) happens to
be set.
it needs to be added to _TIF_WORK_MASK, or we might not reach
do_work_pending() in the first place...
Fixes: 5a9a8897c253a "alpha: add support for TIF_NOTIFY_SIGNAL" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
While we correctly skips to initialize an idle state from a disabled idle
state node in DT, the returned value from dt_init_idle_driver() don't get
adjusted accordingly. Instead the number of found idle state nodes are
returned, while the callers are expecting the number of successfully
initialized idle states from DT.
This leads to cpuidle drivers unnecessarily continues to initialize their
idle state specific data. Moreover, in the case when all idle states have
been disabled in DT, we would end up registering a cpuidle driver, rather
than relying on the default arch specific idle call.
Fixes: 9f14da345599 ("drivers: cpuidle: implement DT based idle states infrastructure") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the utilization of the woken up task is 0, we skip the energy
calculation because it has no impact.
But if the task is boosted (uclamp_min != 0) will have an impact on task
placement and frequency selection. Only skip if the util is truly
0 after applying uclamp values.
Change uclamp_task_cpu() signature to avoid unnecessary additional calls
to uclamp_eff_get(). feec() is the only user now.
Fixes: 732cd75b8c920 ("sched/fair: Select an energy-efficient CPU on task wake-up") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220804143609.515789-8-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
This fixes a major problem of busy tasks capped with UCLAMP_MAX keeping
the system in overutilized state which disables EAS and leads to wasting
energy in the long run.
Without this patch running a busy background activity like JIT
compilation on Pixel 6 causes the system to be in overutilized state
74.5% of the time.
With this patch this goes down to 9.79%.
It also fixes another problem when long running tasks that have their
UCLAMP_MIN changed while running such that they need to upmigrate to
honour the new UCLAMP_MIN value. The upmigration doesn't get triggered
because overutilized state never gets set in this state, hence misfit
migration never happens at tick in this case until the task wakes up
again.
Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220804143609.515789-7-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Use the new util_fits_cpu() to ensure migration margin and capacity
pressure are taken into account correctly when uclamp is being used
otherwise we will fail to consider CPUs as fitting in scenarios where
they should.
s/asym_fits_capacity/asym_fits_cpu/ to better reflect what it does now.
Fixes: b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220804143609.515789-6-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Use the new util_fits_cpu() to ensure migration margin and capacity
pressure are taken into account correctly when uclamp is being used
otherwise we will fail to consider CPUs as fitting in scenarios where
they should.
Fixes: b4c9c9f15649 ("sched/fair: Prefer prev cpu in asymmetric wakeup path") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220804143609.515789-5-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
As reported by Yun Hsiang [1], if a task has its uclamp_min >= 0.8 * 1024,
it'll always pick the previous CPU because fits_capacity() will always
return false in this case.
The new util_fits_cpu() logic should handle this correctly for us beside
more corner cases where similar failures could occur, like when using
UCLAMP_MAX.
We open code uclamp_rq_util_with() except for the clamp() part,
util_fits_cpu() needs the 'raw' values to be passed to it.
Also introduce uclamp_rq_{set, get}() shorthand accessors to get uclamp
value for the rq. Makes the code more readable and ensures the right
rules (use READ_ONCE/WRITE_ONCE) are respected transparently.
fits_capacity() verifies that a util is within 20% margin of the
capacity of a CPU, which is an attempt to speed up upmigration.
But when uclamp is used, this 20% margin is problematic because for
example if a task is boosted to 1024, then it will not fit on any CPU
according to fits_capacity() logic.
Or if a task is boosted to capacity_orig_of(medium_cpu). The task will
end up on big instead on the desired medium CPU.
Similar corner cases exist for uclamp and usage of capacity_of().
Slightest irq pressure on biggest CPU for example will make a 1024
boosted task look like it can't fit.
What we really want is for uclamp comparisons to ignore the migration
margin and capacity pressure, yet retain them for when checking the
_actual_ util signal.
For example, task p:
p->util_avg = 300
p->uclamp[UCLAMP_MIN] = 1024
Will fit a big CPU. But
p->util_avg = 900
p->uclamp[UCLAMP_MIN] = 1024
will not, this should trigger overutilized state because the big CPU is
now *actually* being saturated.
Similar reasoning applies to capping tasks with UCLAMP_MAX. For example:
Should fit the task on medium cpus without triggering overutilized
state.
Inlined comments expand more on desired behavior in more scenarios.
Introduce new util_fits_cpu() function which encapsulates the new logic.
The new function is not used anywhere yet, but will be used to update
various users of fits_capacity() in later patches.
Fixes: af24bde8df202 ("sched/uclamp: Add uclamp support to energy_compute()") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220804143609.515789-2-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The error message in __crb_relinquish_locality() mentions requestAccess
instead of Relinquish. Fix it.
Fixes: 888d867df441 ("tpm: cmd_ready command can be issued only after granting locality") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ftpm_mod_init() returns the driver_register() directly without checking
its return value, if driver_register() failed, the ftpm_tee_plat_driver is
not unregistered.
Fix by unregister ftpm_tee_plat_driver when driver_register() failed.
Fixes: 9f1944c23c8c ("tpm_ftpm_tee: register driver on TEE bus") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Maxim Uvarov <maxim.uvarov@linaro.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The check for cancelled request depends on the VID of the chip, but
some chips share VID which shouldn't share their cancellation
behavior. This is the case for the Nuvoton NPCT75X, which should use
the default cancellation check, not the Winbond one.
To avoid changing the existing behavior, add a new flag to indicate
that the chip should use the default cancellation check and set it
for the I2C TPM2 TIS driver.
Fixes: bbc23a07b072 ("tpm: Add tpm_tis_i2c backend for tpm_tis_core") Signed-off-by: Eddie James <eajames@linux.ibm.com> Tested-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The sanity check mask for TPM_INT_ENABLE register was off by 8 bits,
resulting in failure to probe if the TPM_INT_ENABLE register was a
valid value.
Fixes: bbc23a07b072 ("tpm: Add tpm_tis_i2c backend for tpm_tis_core") Signed-off-by: Eddie James <eajames@linux.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
An oops can be induced by running 'cat /proc/kcore > /dev/null' on
devices using pstore with the ram backend because kmap_atomic() assumes
lowmem pages are accessible with __va().
During early boot, memblock reserves the pages for the ramoops reserved
memory node in DT that would otherwise be part of the direct lowmem
mapping. Pstore's ram backend reuses those reserved pages to change the
memory type (writeback or non-cached) by passing the pages to vmap()
(see pfn_to_page() usage in persistent_ram_vmap() for more details) with
specific flags. When read_kcore() starts iterating over the vmalloc
region, it runs over the virtual address that vmap() returned for
ramoops. In aligned_vread() the virtual address is passed to
vmalloc_to_page() which returns the page struct for the reserved lowmem
area. That lowmem page is passed to kmap_atomic(), which effectively
calls page_to_virt() that assumes a lowmem page struct must be directly
accessible with __va() and friends. These pages are mapped via vmap()
though, and the lowmem mapping was never made, so accessing them via the
lowmem virtual address oopses like above.
Let's side-step this problem by passing VM_IOREMAP to vmap(). This will
tell vread() to not include the ramoops region in the kcore. Instead the
area will look like a bunch of zeros. The alternative is to teach kmap()
about vmalloc areas that intersect with lowmem. Presumably such a change
isn't a one-liner, and there isn't much interest in inspecting the
ramoops region in kcore files anyway, so the most expedient route is
taken for now.
Cc: Brian Geffon <bgeffon@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221205233136.3420802-1-swboyd@chromium.org Signed-off-by: Sasha Levin <sashal@kernel.org>
timer_read() was using an empty 100-iteration loop to wait for the
TMR_CVWR register to capture the latest timer counter value. The delay
wasn't long enough. This resulted in CPU idle time being extremely
underreported on PXA168 with CONFIG_NO_HZ_IDLE=y.
Switch to the approach used in the vendor kernel, which implements the
capture delay by reading TMR_CVWR a few times instead.
Fixes: 49cbe78637eb ("[ARM] pxa: add base support for Marvell's PXA168 processor line") Signed-off-by: Doug Brown <doug@schmorgal.com> Link: https://lore.kernel.org/r/20221204005117.53452-3-doug@schmorgal.com Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the if (dev_of_node(dev) && !pdata) path, the "err" may be assigned a
value of 0, so the error return code -EINVAL may be incorrectly set
to 0. To fix set valid return code before calling to goto.
Our syzbot instance reported memory leaks in do_seccomp() [0], similar
to the report [1]. It shows that we miss freeing struct seccomp_filter
and some objects included in it.
We can reproduce the issue with the program below [2] which calls one
seccomp() and two clone() syscalls.
The first clone()d child exits earlier than its parent and sends a
signal to kill it during the second clone(), more precisely before the
fatal_signal_pending() test in copy_process(). When the parent receives
the signal, it has to destroy the embryonic process and return -EINTR to
user space. In the failure path, we have to call seccomp_filter_release()
to decrement the filter's refcount.
Initially, we called it in free_task() called from the failure path, but
the commit 3a15fb6ed92c ("seccomp: release filter after task is fully
dead") moved it to release_task() to notify user space as early as possible
that the filter is no longer used.
To keep the change and current seccomp refcount semantics, let's move
copy_seccomp() just after the signal check and add a WARN_ON_ONCE() in
free_task() for future debugging.
BDF of resource in DT assigned-addresses property of Marvell PCIe Root Port
(PCI-to-PCI bridge) should match BDF in address part in that DT node name
as specified resource belongs to Marvell PCIe Root Port itself.
Fixes: 538da83ddbea ("ARM: mvebu: add Device Tree files for Armada 39x SoC and board") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
BDF of resource in DT assigned-addresses property of Marvell PCIe Root Port
(PCI-to-PCI bridge) should match BDF in address part in that DT node name
as specified resource belongs to Marvell PCIe Root Port itself.
Fixes: 0d3d96ab0059 ("ARM: mvebu: add Device Tree description of the Armada 380/385 SoCs") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
BDF of resource in DT assigned-addresses property of Marvell PCIe Root Port
(PCI-to-PCI bridge) should match BDF in address part in that DT node name
as specified resource belongs to Marvell PCIe Root Port itself.
Fixes: 4de59085091f ("ARM: mvebu: add Device Tree description of the Armada 375 SoC") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
BDF of resource in DT assigned-addresses property of Marvell PCIe Root Port
(PCI-to-PCI bridge) should match BDF in address part in that DT node name
as specified resource belongs to Marvell PCIe Root Port itself.
Fixes: 9d8f44f02d4a ("arm: mvebu: add PCIe Device Tree informations for Armada XP") Fixes: 12b69a599745 ("ARM: mvebu: second PCIe unit of Armada XP mv78230 is only x1 capable") Fixes: 2163e61c92d9 ("ARM: mvebu: fix second and third PCIe unit of Armada XP mv78260") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
BDF of resource in DT assigned-addresses property of Marvell PCIe Root Port
(PCI-to-PCI bridge) should match BDF in address part in that DT node name
as specified resource belongs to Marvell PCIe Root Port itself.
Fixes: a09a0b7c6ff1 ("arm: mvebu: add PCIe Device Tree informations for Armada 370") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
BDF of resource in DT assigned-addresses property of Marvell PCIe Root Port
(PCI-to-PCI bridge) should match BDF in address part in that DT node name
as specified resource belongs to Marvell PCIe Root Port itself.
commit edf408b946d3 ("PCI: dwc: Validate iATU outbound mappings against
hardware constraints") exposes an issue with the existing partitioning of
the aperture space where the Prefetchable apertures of controllers
C5, C7 and C9 in Tegra234 cross the 32GB boundary hardware constraint.
This patch makes sure that the Prefetchable region doesn't spill over
the 32GB boundary.
The unit address for the pinctrl node is (0x)1000b000 and not
(0x)10005000, which is the syscfg_pctl_a address instead.
This fixes the following warning:
arch/arm64/boot/dts/mediatek/mt2712e.dtsi:264.40-267.4: Warning
(unique_unit_address): /syscfg_pctl_a@10005000: duplicate
unit-address (also used in node /pinctrl@10005000)
Rename fixed-clock oscillators to oscillator-26m and oscillator-32k
and remove the unit address to fix the unit_address_vs_reg warning;
fix the unit address for interrupt and intpol controllers by
removing a leading zero in their unit address.
This commit fixes the following warnings:
(unit_address_vs_reg): /oscillator@0: node has a unit name, but
no reg or ranges property
(unit_address_vs_reg): /oscillator@1: node has a unit name, but
no reg or ranges property
(simple_bus_reg): /soc/interrupt-controller@0c000000: simple-bus
unit address format error, expected "c000000"
(simple_bus_reg): /soc/intpol-controller@0c53a650: simple-bus
unit address format error, expected "c53a650"
Rename the oscillator fixed-clock to oscillator-40m and remove
the unit address to fix warnings.
arch/arm64/boot/dts/mediatek/mt7986a.dtsi:17.23-22.4: Warning
(unit_address_vs_reg): /oscillator@0: node has a unit name,
but no reg or ranges property
Fixes: 1f9986b258c2 ("arm64: dts: mediatek: add clock support for mt7986a") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20221013152212.416661-2-angelogioacchino.delregno@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The capacity-dmips-mhz parameter was miscalculated: this SoC runs
the first (Cortex-A55) cluster at a maximum of 2000MHz and the
second (Cortex-A78) cluster at a maximum of 3000MHz.
In order to calculate the right capacity-dmips-mhz, the following
test was performed:
1. CPUFREQ governor was set to 'performance' on both clusters
2. Ran dhrystone with 500000000 iterations for 10 times on each cluster
3. Calculate the mean result for each cluster
4. Calculate DMIPS/MHz: dmips_mhz = dmips_per_second / cpu_mhz
5. Scale results to 1024:
result_c0 = (dmips_mhz_c0 - min_dmips_mhz(c0, c1)) /
(max_dmips_mhz(c0, c1) - min_dmips_mhz(c0, c1)) * 1024
The mean results for this SoC are:
Cluster 0 (LITTLE): 11990400 Dhry/s
Cluster 1 (BIG): 59809036 Dhry/s
The calculated scaled results are:
Cluster 0: 307,934312801831 (rounded to 308)
Cluster 1: 1024
Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20221005093404.33102-1-angelogioacchino.delregno@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The SEV kit reference design does not hook up the PCIe root port to the
core complex including it is misleading.
The entry is a re-use mistake - I was not aware of this when I moved
the PCIe node out of mpfs.dtsi so that individual bistreams could
connect it to different fics etc.
The node is disabled, so there should be no functional change here.
The GPIO interrupts are not working because of that. The toggling works
fine but interrupts are not firing. Fix the parent's input irq that
specifies the base for parent irq.
Tested for MAIN_GPIO0_6 interrupt on the j721s2 EVM.
arm_smmu_pmu_init() won't remove the callback added by
cpuhp_setup_state_multi() when platform_driver_register() failed. Remove
the callback by cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
dmc620_pmu_init() won't remove the callback added by
cpuhp_setup_state_multi() when platform_driver_register() failed. Remove
the callback by cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 53c218da220c ("driver/perf: Add PMU driver for the ARM DMC-620 memory controller") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20221115115540.6245-2-shangxiaojing@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
tad_pmu_init() won't remove the callback added by cpuhp_setup_state_multi()
when platform_driver_register() failed. Remove the callback by
cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
dsu_pmu_init() won't remove the callback added by cpuhp_setup_state_multi()
when platform_driver_register() failed. Remove the callback by
cpuhp_remove_multi_state() in fail path.
Similar to the handling of arm_ccn_init() in commit 26242b330093 ("bus:
arm-ccn: Prevent hotplug callback leak")
Fixes: 7520fa99246d ("perf: ARM DynamIQ Shared Unit PMU support") Signed-off-by: Yuan Can <yuancan@huawei.com> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20221115070207.32634-2-yuancan@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Alexander noted that KFENCE only expects to handle faults from invalid page
table entries (i.e. translation faults), but arm64's fault handling logic will
call kfence_handle_page_fault() for other types of faults, including alignment
faults caused by unaligned atomics. This has the unfortunate property of
causing those other faults to be reported as "KFENCE: use-after-free",
which is misleading and hinders debugging.
Fix this by only forwarding unhandled translation faults to the KFENCE
code, similar to what x86 does already.
Alexander has verified that this passes all the tests in the KFENCE test
suite and avoids bogus reports on misaligned atomics.
The pm_runtime_enable will increase power disable depth. Thus
a pairing decrement is needed on the error handling path to
keep it balanced according to context.
The pm_runtime_enable will increase power disable depth. Thus
a pairing decrement is needed on the error handling path to
keep it balanced according to context.
\#pwm-cells for the Icicle kit's fabric PWM was incorrectly set to 2 &
blindly overridden by the (out of tree) driver anyway. The core can
support inverted operation, so update the entry to correctly report its
capabilities.
When the modem node was re-located to a separate LTE source file
"sc7280-herobrine-lte-sku.dtsi", some of the previous LTE users
weren't marked appropriately. Fix this by marking all Qualcomm
reference devices as LTE.
Suggested-by: Douglas Anderson <dianders@chromium.org> Fixes: d42fae738f3a ("arm64: dts: qcom: Add LTE SKUs for sc7280-villager family") Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221110070813.1777-1-quic_sibis@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
To enable error reporting for a fabric to CCPLEX, we need to write its
register for enabling error interrupt to CCPLEX during boot and later
clear the error status register after error occurs. If a fabric's
registers are protected and not accessible from CCPLEX, then accessing
the registers will cause CBB firewall error.
Add support to check whether write access from CCPLEX to the registers
of a fabric is not blocked by it's firewall before enabling error
reporting to CCPLEX for that fabric.
Added checks to avoid potential out of bounds errors which can happen if
the 'slave map' and 'CBB errors' arrays are not correct or latest where
some entries are missing.
In Tegra194 SoC, master_id bit range is different between cluster NOC
and CBB NOC. Currently same bit range is used which results in wrong
master_id value. Due to this, illegal accesses from the CCPLEX master
do not result in a crash as expected. Fix this by using the correct
range for the CBB NOC.
Finally, it is only necessary to extract the master_id when the
erd_mask_inband_err flag is set because when this is not set, a crash
is always triggered.
The preferred form for Renesas' compatible strings is:
"<vendor>,<family>-<module>"
Somehow the compatible string for the r9a09g011 I2C IP was upstreamed
as renesas,i2c-r9a09g011 instead of renesas,r9a09g011-i2c, which
is really confusing, especially considering the generic fallback
is renesas,rzv2m-i2c.
The first user of renesas,i2c-r9a09g011 in the kernel is not yet in
a kernel release, it will be in v6.1, therefore it can still be
fixed in v6.1.
Even if we don't fix it before v6.2, I don't think there is any
harm in making such a change.
s/renesas,i2c-r9a09g011/renesas,r9a09g011-i2c/g for consistency.
Although the HW User Manual for RZ/V2M states in the "Address Map"
section that the interrupt controller is assigned addresses starting
from 0x82000000, the memory locations from 0x82000000 0x0x8200FFFF
are marked as reserved in the "Interrupt Controller (GIC)" section
and are currently not used by the device tree, leading to the below
warning:
arch/arm64/boot/dts/renesas/r9a09g011.dtsi:51.38-63.5: Warning
(simple_bus_reg): /soc/interrupt-controller@82000000: simple-bus unit
address format error, expected "82010000"
As serial communication requires a clean clock signal, the Serial
Communication Interfaces with FIFO (SCIF) are clocked by a clock that is
not affected by Spread Spectrum or Fractional Multiplication.
Hence change the clock input for the SCIF Baud Rate Generator internal
clock from the S0D3_PER clock to the SASYNCPERD1 clock (which has the
same clock rate), cfr. R-Car S4-8 Hardware User's Manual rev. 0.81.
As serial communication requires a clean clock signal, the High Speed
Serial Communication Interfaces with FIFO (HSCIF) are clocked by a clock
that is not affected by Spread Spectrum or Fractional Multiplication.
Hence change the clock input for the HSCIF Baud Rate Generator internal
clock from the S0D3_PER clock to the SASYNCPERD1 clock (which has the
same clock rate), cfr. R-Car S4-8 Hardware User's Manual rev. 0.81.
As idr_alloc() and of_property_read_string_index() can return negative
numbers, it should be better to check the return value and deal with
the exception.
Therefore, it should be better to use goto statement to stop and return
error.