Unify the sequence of enabling the TRBE. We do this from
event_start and also from the TRBE IRQ handler. Lets move
this to a common helper. The only minor functional change
is returning an error when we fail to enable the TRBE.
This should be handled already.
Since we now have unique entry point to trying to enable TRBE,
move the format flag setting to the central place.
Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-9-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
We mark the buffer as TRUNCATED when there is no space left
in the buffer. But we do it at different points.
__trbe_normal_offset()
and also, at all the callers of the above function via
compute_trbe_buffer_limit(), when the limit == base (i.e
offset = 0 as returned by the __trbe_normal_offset()).
So, given that the callers already mark the buffer as TRUNCATED
drop the caller inside the __trbe_normal_offset().
This is in preparation to moving the handling of TRUNCATED
into a central place.
Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-6-suzuki.poulose@arm.com
[Moved comment as Anshuman requested] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
coresight: trbe: Ensure the format flag is always set
When the TRBE is stopped on truncating an event, we may not
set the FORMAT flag, even though the size of the record is 0.
Let us be consistent and not confuse the user.
To ensure that the format flag is always set on all the
records generated by TRBE, set the flag when we have a
new handle. Rather than deferring to the "end" operation,
which makes it clear. So, we can do this from
- arm_trbe_enable() -> When a new handle is provided by the
CoreSight PMU, triggered via etm_event_start()
- trbe_handle_overflow() -> When we begin a new handle after
closing the previous on overflow.
Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-5-suzuki.poulose@arm.com
[Fixed inverted words in title] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
coresight: etm-pmu: Ensure the AUX handle is valid
The ETM perf infrastructure closes out a handle during event_stop
or on an error in starting the event. In either case, it is possible
for a "sink" to update/close the handle, under certain circumstances.
(e.g no space in ring buffer.). So, ensure that we handle this
gracefully in the PMU driver by verifying the handle is still valid.
Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-4-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
coresight: etm4x: Use Trace Filtering controls dynamically
The Trace Filtering support (FEAT_TRF) ensures that the ETM
can be prohibited from generating any trace for a given EL.
This is much stricter knob, than the TRCVICTLR exception level
masks, which doesn't prevent the ETM from generating Context
packets for an "excluded" EL. At the moment, we do a onetime
enable trace at user and kernel and leave it untouched for the
kernel life time. This implies that the ETM could potentially
generate trace packets containing the kernel addresses, and
thus leaking the kernel virtual address in the trace.
This patch makes the switch dynamic, by honoring the filters
set by the user and enforcing them in the TRFCR controls.
We also rename the cpu_enable_tracing() appropriately to
cpu_detect_trace_filtering() and the drvdata member
trfc => trfcr to indicate the "value" of the TRFCR_EL1.
Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-3-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
When the CPU enters a low power mode, the TRFCR_EL1 contents could be
reset. Thus we need to save/restore the TRFCR_EL1 along with the ETM4x
registers to allow the tracing.
The TRFCR related helpers are in a new header file, as we need to use
them for TRBE in the later patches.
Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-2-suzuki.poulose@arm.com
[Fixed cosmetic details] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
James Clark [Wed, 22 Sep 2021 12:51:44 +0000 (13:51 +0100)]
coresight: Don't immediately close events that are run on invalid CPU/sink combos
When a traced process runs on a CPU that can't reach the selected sink,
the event will be stopped with PERF_HES_STOPPED. This means that even if
the process migrates to a valid CPU, tracing will not resume.
This can be reproduced (on N1SDP) by using taskset to start the process
on CPU 0, and then switching it to CPU 2 (ETF 1 is only reachable from
CPU 2):
taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls
This produces a single 0 length AUX record, and then no more trace:
After the fix, the same command produces normal AUX records. The perf
self test "89: Check Arm CoreSight trace data recording and synthesized
samples" no longer fails intermittently. This was because the taskset in
the test is after the fork, so there is a period where the task is
scheduled on a random CPU rather than forced to a valid one.
Specifically selecting an invalid CPU will still result in a failure to
open the event because it will never produce trace:
./perf record -C 2 -e cs_etm/@tmc_etf0/
failed to mmap with 12 (Cannot allocate memory)
The only scenario that has changed is if the CPU mask has a valid CPU
sink combo in it.
Testing
=======
* Coresight self test passes consistently:
./perf test Coresight
* CPU wide mode still produces trace:
./perf record -e cs_etm// -a
* Invalid -C options still fail to open:
./perf record -C 2,3 -e cs_etm/@tmc_etf0/
failed to mmap with 12 (Cannot allocate memory)
* Migrating a task to a valid sink/CPU now produces trace:
taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls
* If the task remains on an invalid CPU, no trace is emitted:
taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- ls
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210922125144.133872-2-james.clark@arm.com Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Leo Yan [Sun, 5 Sep 2021 03:21:44 +0000 (11:21 +0800)]
coresight: tmc-etr: Speed up for bounce buffer in flat mode
The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the
low level's architecture code, e.g. for Arm64, it maps the memory with
the attribution "Normal non-cacheable"; this can be concluded from the
definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h.
Later when access the AUX bounce buffer, since the memory mapping is
non-cacheable, it's low efficiency due to every load instruction must
reach out DRAM.
This patch changes to allocate pages with dma_alloc_noncoherent(), the
driver can access the memory via cacheable mapping; therefore, load
instructions can fetch data from cache lines rather than always read
data from DRAM, the driver can boost memory performance. After using
the cacheable mapping, the driver uses dma_sync_single_for_cpu() to
invalidate cacheline prior to read bounce buffer so can avoid read stale
trace data.
By measurement the duration for function tmc_update_etr_buffer() with
ftrace function_graph tracer, it shows the performance significant
improvement for copying 4MiB data from bounce buffer:
Leo Yan [Sun, 12 Sep 2021 12:57:48 +0000 (20:57 +0800)]
coresight: Update comments for removing cs_etm_find_snapshot()
Commit 2f01c200d440 ("perf cs-etm: Remove callback cs_etm_find_snapshot()")
has removed the function cs_etm_find_snapshot() from the perf tool in the
user space, now CoreSight trace directly uses the perf common function
__auxtrace_mmap__read() to calcualte the head and size for AUX trace data
in snapshot mode.
This patch updates the comments in drivers to make them generic and not
stick to any specific function from perf tool.
Leo Yan [Sun, 12 Sep 2021 12:57:47 +0000 (20:57 +0800)]
coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
When enable the Arm CoreSight PMU event, the context for AUX ring buffer
is prepared in the structure perf_output_handle, and its field "head"
points the head of the AUX ring buffer and it is updated after filling
AUX trace data into buffer.
Current code uses an extra field etr_perf_buffer::head to maintain the
header for the AUX ring buffer which is not necessary; alternatively,
it's better to directly use perf_output_handle::head.
This patch removes the field etr_perf_buffer::head and directly uses
perf_output_handle::head for the head of AUX ring buffer.
Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210912125748.2816606-2-leo.yan@linaro.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Leo Yan [Mon, 9 Aug 2021 11:14:01 +0000 (19:14 +0800)]
coresight: tmc-etf: Add comment for store ordering
Since the function CS_LOCK() has contained memory barrier mb(), it
ensures the visibility of the AUX trace data before updating the
aux_head, thus it's needless to add any explicit barrier anymore.
Add comment to make clear for the barrier usage for ETF.
Leo Yan [Mon, 9 Aug 2021 11:14:00 +0000 (19:14 +0800)]
coresight: tmc-etr: Add barrier after updating AUX ring buffer
Since a memory barrier is required between AUX trace data store and
aux_head store, and the AUX trace data is filled with memcpy(), it's
sufficient to use smp_wmb() so can ensure the trace data is visible
prior to updating aux_head.
Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210809111407.596077-3-leo.yan@linaro.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
The current driver sets the write burst size initiated by TMC-ETR on
AXI bus to a fixed value of 16. Make this configurable by reading the
value specified in fwnode. If not specified, then default to 16.
Introduced a "max_burst_size" variable in tmc_drvdata structure to
facilitate this change.
Add "arm,max-burst-size" optional property for TMC ETR.
If specified, this value indicates the maximum burst size
that can be initiated by TMC on the AXI bus.
Brian Norris [Sat, 4 Sep 2021 01:28:54 +0000 (18:28 -0700)]
coresight: cpu-debug: Control default behavior via Kconfig
Debugfs is nice and so are module parameters, but
* debugfs doesn't take effect early (e.g., if drivers are locking up
before user space gets anywhere)
* module parameters either add a lot to the kernel command line, or
else take effect late as well (if you build =m and configure in
/etc/modprobe.d/)
So in the same spirit as these
CONFIG_PANIC_ON_OOPS (also available via cmdline or modparam)
CONFIG_INTEL_IOMMU_DEFAULT_ON (also available via cmdline)
add a new Kconfig option.
Tao Zhang [Thu, 19 Aug 2021 09:29:37 +0000 (17:29 +0800)]
coresight: cti: Correct the parameter for pm_runtime_put
The input parameter of the function pm_runtime_put should be the
same in the function cti_enable_hw and cti_disable_hw. The correct
parameter to use here should be dev->parent.
Signed-off-by: Tao Zhang <quic_taozha@quicinc.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver") Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1629365377-5937-1-git-send-email-quic_taozha@quicinc.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Suzuki K Poulose [Tue, 19 Oct 2021 16:31:42 +0000 (17:31 +0100)]
arm64: errata: Add detection for TRBE write to out-of-range
Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where
the trbe, under some circumstances, might write upto 64bytes to an
address after the Limit as programmed by the TRBLIMITR_EL1.LIMIT.
This might -
- Corrupt a page in the ring buffer, which may corrupt trace from a
previous session, consumed by userspace.
- Hit the guard page at the end of the vmalloc area and raise a fault.
To keep the handling simpler, we always leave the last page from the
range, which TRBE is allowed to write. This can be achieved by ensuring
that we always have more than a PAGE worth space in the range, while
calculating the LIMIT for TRBE. And then the LIMIT pointer can be
adjusted to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the
TRBE range while enabling it. This makes sure that the TRBE will only
write to an area within its allowed limit (i.e, [head-head+size]) and
we do not have to handle address faults within the driver.
Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-5-suzuki.poulose@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Suzuki K Poulose [Tue, 19 Oct 2021 16:31:41 +0000 (17:31 +0100)]
arm64: errata: Add workaround for TSB flush failures
Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
from errata, where a TSB (trace synchronization barrier)
fails to flush the trace data completely, when executed from
a trace prohibited region. In Linux we always execute it
after we have moved the PE to trace prohibited region. So,
we can apply the workaround every time a TSB is executed.
The work around is to issue two TSB consecutively.
NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
that a late CPU could be blocked from booting if it is the
first CPU that requires the workaround. This is because we
do not allow setting a cpu_hwcaps after the SMP boot. The
other alternative is to use "this_cpu_has_cap()" instead
of the faster system wide check, which may be a bit of an
overhead, given we may have to do this in nvhe KVM host
before a guest entry.
Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-4-suzuki.poulose@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Suzuki K Poulose [Tue, 19 Oct 2021 16:31:40 +0000 (17:31 +0100)]
arm64: errata: Add detection for TRBE overwrite in FILL mode
Arm Neoverse-N2 and the Cortex-A710 cores are affected
by a CPU erratum where the TRBE will overwrite the trace buffer
in FILL mode. The TRBE doesn't stop (as expected in FILL mode)
when it reaches the limit and wraps to the base to continue
writing upto 3 cache lines. This will overwrite any trace that
was written previously.
Add the Neoverse-N2 erratum(#2139208) and Cortex-A710 erratum
(#2119858) to the detection logic.
This will be used by the TRBE driver in later patches to work
around the issue. The detection has been kept with the core
arm64 errata framework list to make sure :
- We don't duplicate the framework in TRBE driver
- The errata detection is advertised like the rest
of the CPU errata.
Note that the Kconfig entries are not fully active until the
TRBE driver implements the work around.
Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org>
cc: Leo Yan <leo.yan@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-3-suzuki.poulose@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Merge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Five fixes for the ksmbd kernel server, including three security
fixes:
- remove follow symlinks support
- use LOOKUP_BENEATH to prevent out of share access
- SMB3 compounding security fix
- fix for returning the default streams correctly, fixing a bug when
writing ppt or doc files from some clients
- logging more clearly that ksmbd is experimental (at module load
time)"
* tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: use LOOKUP_BENEATH to prevent the out of share access
ksmbd: remove follow symlinks support
ksmbd: check protocol id in ksmbd_verify_smb_message()
ksmbd: add default data stream name in FILE_STREAM_INFORMATION
ksmbd: log that server is experimental at module load
Merge tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
"Fix two EDAC drivers using the wrong value type for the DIMM mode"
* tag 'edac_urgent_for_v5.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/dmc520: Assign the proper type to dimm->edac_mode
EDAC/synopsys: Fix wrong value type assignment for edac_mode
Merge tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal fixes from Daniel Lezcano:
- Fix thermal shutdown after a suspend/resume due to a wrong TCC value
restored on Intel platform (Antoine Tenart)
- Fix potential buffer overflow when building the list of policies. The
buffer size is not updated after writing to it (Dan Carpenter)
- Fix wrong check against IS_ERR instead of NULL (Ansuel Smith)
* tag 'thermal-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/drivers/tsens: Fix wrong check for tzd in irq handlers
thermal/core: Potential buffer overflow in thermal_build_list_of_policies()
thermal/drivers/int340x: Do not set a wrong tcc offset on resume
Merge tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for X86:
- Prevent sending the wrong signal when protection keys are enabled
and the kernel handles a fault in the vsyscall emulation.
- Invoke early_reserve_memory() before invoking e820_memory_setup()
which is required to make the Xen dom0 e820 hooks work correctly.
- Use the correct data type for the SETZ operand in the EMQCMDS
instruction wrapper.
- Prevent undefined behaviour to the potential unaligned accesss in
the instruction decoder library"
* tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
x86/asm: Fix SETZ size enqcmds() build failure
x86/setup: Call early_reserve_memory() earlier
x86/fault: Fix wrong signal when vsyscall fails with pkey
Merge tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for the recently introduced regression in posix CPU
timers which failed to stop the timer when requested. That caused
unexpected signals to be sent to the process/thread causing
malfunction"
* tag 'timers-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Prevent spuriously armed 0-value itimer
Merge tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Work around a bad GIC integration on a Renesas platform which can't
handle byte-sized MMIO access
- Plug a potential memory leak in the GICv4 driver
- Fix a regression in the Armada 370-XP IPI code which was caused by
issuing EOI instack of ACK.
- A couple of small fixes here and there"
* tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic: Work around broken Renesas integration
irqchip/renesas-rza1: Use semicolons instead of commas
irqchip/gic-v3-its: Fix potential VPE leak on error
irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
irqchip/mbigen: Repair non-kernel-doc notation
irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent
irqchip/armada-370-xp: Fix ack/eoi breakage
Documentation: Fix irq-domain.rst build warning
Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts,
lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache,
debug, and pagemap)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix uninitialized use in overcommit_policy_handler
mm/memory_failure: fix the missing pte_unmap() call
kasan: always respect CONFIG_KASAN_STACK
sh: pgtable-3level: fix cast to pointer from integer of different size
mm/debug: sync up latest migrate_reason to migrate_reason_names
mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
mm: fs: invalidate bh_lrus for only cold path
lib/zlib_inflate/inffast: check config in C to avoid unused function warning
tools/vm/page-types: remove dependency on opt_file for idle page tracking
scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
ocfs2: drop acl cache for directories too
mm/shmem.c: fix judgment error in shmem_is_huge()
xtensa: increase size of gcc stack frame check
mm/damon: don't use strnlen() with known-bogus source length
kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Thirty-three fixes, I'm afraid.
Essentially the build up from the last couple of weeks while I've been
dealling with Linux Plumbers conference infrastructure issues. It's
mostly the usual assortment of spelling fixes and minor corrections.
The only core relevant changes are to the sd driver to reduce the spin
up message spew and fix a small memory leak on the freeing path"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (33 commits)
scsi: ses: Retry failed Send/Receive Diagnostic commands
scsi: target: Fix spelling mistake "CONFLIFT" -> "CONFLICT"
scsi: lpfc: Fix gcc -Wstringop-overread warning, again
scsi: lpfc: Use correct scnprintf() limit
scsi: lpfc: Fix sprintf() overflow in lpfc_display_fpin_wwpn()
scsi: core: Remove 'current_tag'
scsi: acornscsi: Remove tagged queuing vestiges
scsi: fas216: Kill scmd->tag
scsi: qla2xxx: Restore initiator in dual mode
scsi: ufs: core: Unbreak the reset handler
scsi: sd_zbc: Support disks with more than 2**32 logical blocks
scsi: ufs: core: Revert "scsi: ufs: Synchronize SCSI and UFS error handling"
scsi: bsg: Fix device unregistration
scsi: sd: Make sd_spinup_disk() less noisy
scsi: ufs: ufs-pci: Fix Intel LKF link stability
scsi: mpt3sas: Clean up some inconsistent indenting
scsi: megaraid: Clean up some inconsistent indenting
scsi: sr: Fix spelling mistake "does'nt" -> "doesn't"
scsi: Remove SCSI CDROM MAINTAINERS entry
scsi: megaraid: Fix Coccinelle warning
...
Merge tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"This one looks a bit bigger than it is, but that's mainly because 2/3
of it is enabling IORING_OP_CLOSE to close direct file descriptors.
We've had a few folks using them and finding it confusing that the way
to close them is through using -1 for file update, this just brings
API symmetry for direct descriptors. Hence I think we should just do
this now and have a better API for 5.15 release. There's some room for
de-duplicating the close code, but we're leaving that for the next
merge window.
Outside of that, just small fixes:
- Poll race fixes (Hao)
- io-wq core dump exit fix (me)
- Reschedule around potentially intensive tctx and buffer iterators
on teardown (me)
- Fix for always ending up punting files update to io-wq (me)
- Put the provided buffer meta data under memcg accounting (me)
- Tweak for io_write(), removing dead code that was added with the
iterator changes in this release (Pavel)"
* tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
io_uring: make OP_CLOSE consistent with direct open
io_uring: kill extra checks in io_write()
io_uring: don't punt files update to io-wq unconditionally
io_uring: put provided buffer meta data under memcg accounting
io_uring: allow conditional reschedule for intensive iterators
io_uring: fix potential req refcount underflow
io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow
io_uring: fix race between poll completion and cancel_hash insertion
io-wq: ensure we exit if thread group is exiting
Merge tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- keep ctrl->namespaces ordered (Christoph Hellwig)
- fix incorrect h2cdata pdu offset accounting in nvme-tcp (Sagi
Grimberg)
- handled updated hw_queues in nvme-fc more carefully (Daniel
Wagner, James Smart)
- md lock order fix (Christoph)
- fallocate locking fix (Ming)
- blktrace UAF fix (Zhihao)
- rq-qos bio tracking fix (Ming)
* tag 'block-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
block: hold ->invalidate_lock in blkdev_fallocate
blktrace: Fix uaf in blk_trace access after removing by sysfs
block: don't call rq_qos_ops->done_bio if the bio isn't tracked
md: fix a lock order reversal in md_alloc
nvme: keep ctrl->namespaces ordered
nvme-tcp: fix incorrect h2cdata pdu offset accounting
nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
nvme-fc: avoid race between time out and tear down
nvme-fc: update hardware queues before using them
Merge tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Some minor cleanups and fixes of some theoretical bugs, as well as a
fix of a bug introduced in 5.15-rc1"
* tag 'for-linus-5.15b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/x86: fix PV trap handling on secondary processors
xen/balloon: fix balloon kthread freezing
swiotlb-xen: this is PV-only on x86
xen/pci-swiotlb: reduce visibility of symbols
PCI: only build xen-pcifront in PV-enabled environments
swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests
Xen/gntdev: don't ignore kernel unmapping error
xen/x86: drop redundant zeroing from cpu_initialize_context()
Merge tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Two bugfixes to fix the 4KiB blockmap chunk format availability and a
dangling pointer usage. There is also a trivial cleanup to clarify
compacted_2b if compacted_4b_initial > totalidx.
Summary:
- fix the dangling pointer use in erofs_lookup tracepoint
- fix unsupported chunk format check
- zero out compacted_2b if compacted_4b_initial > totalidx"
* tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: clear compacted_2b if compacted_4b_initial > totalidx
erofs: fix misbehavior of unsupported chunk format check
erofs: fix up erofs_lookup tracepoint
Merge tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Six small cifs/smb3 fixes, two for stable:
- important fix for deferred close (found by a git functional test)
related to attribute caching on close.
- four (two cosmetic, two more serious) small fixes for problems
pointed out by smatch via Dan Carpenter
- fix for comment formatting problems pointed out by W=1"
* tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix incorrect check for null pointer in header_assemble
smb3: correct server pointer dereferencing check to be more consistent
smb3: correct smb3 ACL security descriptor
cifs: Clear modified attribute bit from inode flags
cifs: Deal with some warnings from W=1
cifs: fix a sign extension bug
Merge tag 'staging-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for 5.15-rc3:
- greybus tty use-after-free bugfix
- r8188eu ioctl overlap build warning fix
Note, the r8188eu ioctl has been entirely removed for 5.16-rc1, but
it's good to get this fixed now for people using this in 5.15.
Both of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: r8188eu: fix -Wrestrict warnings
staging: greybus: uart: fix tty use after free
Merge tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some USB driver fixes and new device ids for 5.15-rc3.
They include:
- usb-storage quirk additions
- usb-serial new device ids
- usb-serial driver fixes
- USB roothub registration bugfix to resolve a long-reported issue
- usb gadget driver fixes for a large number of small things
- dwc2 driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
USB: serial: option: add device id for Foxconn T99W265
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
USB: serial: cp210x: add part-number debug printk
USB: serial: cp210x: fix dropped characters with CP2102
MAINTAINERS: usb, update Peter Korsgaard's entries
usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
USB: serial: option: remove duplicate USB device ID
USB: serial: mos7840: remove duplicated 0xac24 device ID
arm64: dts: qcom: ipq8074: remove USB tx-fifo-resize property
usb: gadget: f_uac2: Populate SS descriptors' wBytesPerInterval
usb: gadget: f_uac2: Add missing companion descriptor for feedback EP
usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd()
xhci: Set HCD flag to defer primary roothub registration
usb: core: hcd: Add support for deferring roothub registration
usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
usb: dwc3: core: balance phy init and exit
Revert "USB: bcma: Add a check for devm_gpiod_get"
...
Hyunchul Lee [Fri, 24 Sep 2021 15:06:16 +0000 (00:06 +0900)]
ksmbd: use LOOKUP_BENEATH to prevent the out of share access
instead of removing '..' in a given path, call
kern_path with LOOKUP_BENEATH flag to prevent
the out of share access.
ran various test on this:
smb2-cat-async smb://127.0.0.1/homes/../out_of_share
smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share
smbclient //127.0.0.1/homes -c "mkdir ../foo2"
smbclient //127.0.0.1/homes -c "rename bar ../bar"
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Ralph Boehme <slow@samba.org> Tested-by: Steve French <smfrench@gmail.com> Tested-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, the asan-stack parameter is only passed along if
CFLAGS_KASAN_SHADOW is not empty, which requires KASAN_SHADOW_OFFSET to
be defined in Kconfig so that the value can be checked. In RISC-V's
case, KASAN_SHADOW_OFFSET is not defined in Kconfig, which means that
asan-stack does not get disabled with clang even when CONFIG_KASAN_STACK
is disabled, resulting in large stack warnings with allmodconfig:
drivers/video/fbdev/omap2/omapfb/displays/panel-lgphilips-lb035q02.c:117:12: error: stack frame size (14400) exceeds limit (2048) in function 'lb035q02_connect' [-Werror,-Wframe-larger-than]
static int lb035q02_connect(struct omap_dss_device *dssdev)
^
1 error generated.
Ensure that the value of CONFIG_KASAN_STACK is always passed along to
the compiler so that these warnings do not happen when
CONFIG_KASAN_STACK is disabled.
Link: https://github.com/ClangBuiltLinux/linux/issues/1453
References: 6baec880d7a5 ("kasan: turn off asan-stack for clang-8 and earlier") Link: https://lkml.kernel.org/r/20210922205525.570068-1-nathan@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sh: pgtable-3level: fix cast to pointer from integer of different size
If X2TLB=y (CPU_SHX2=y or CPU_SHX3=y, e.g. migor_defconfig), pgd_t.pgd
is "unsigned long long", causing:
In file included from arch/sh/include/asm/pgtable.h:13,
from include/linux/pgtable.h:6,
from include/linux/mm.h:33,
from arch/sh/kernel/asm-offsets.c:14:
arch/sh/include/asm/pgtable-3level.h: In function `pud_pgtable':
arch/sh/include/asm/pgtable-3level.h:37:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
37 | return (pmd_t *)pud_val(pud);
| ^
Fix this by adding an intermediate cast to "unsigned long", which is
basically what the old code did before.
Link: https://lkml.kernel.org/r/2c2eef3c9a2f57e5609100a4864715ccf253d30f.1631713483.git.geert+renesas@glider.be Fixes: 9cf6fa2458443118 ("mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Daniel Palmer <daniel@thingy.jp> Acked-by: Rob Landley <rob@landley.net> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Rich Felker <dalias@libc.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Jacopo Mondi <jacopo+renesas@jmondi.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Fri, 24 Sep 2021 22:43:47 +0000 (15:43 -0700)]
mm: fs: invalidate bh_lrus for only cold path
The kernel test robot reported the regression of fio.write_iops[1] with
commit 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration").
Since lru_add_drain is called frequently, invalidate bh_lrus there could
increase bh_lrus cache miss ratio, which needs more IO in the end.
This patch moves the bh_lrus invalidation from the hot path( e.g.,
zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all,
lru_cache_disable).
Zhengjun Xing confirmed
"I test the patch, the regression reduced to -2.9%"
Changbin Du [Fri, 24 Sep 2021 22:43:41 +0000 (15:43 -0700)]
tools/vm/page-types: remove dependency on opt_file for idle page tracking
Idle page tracking can also be used for process address space, not only
file mappings.
Without this change, using with '-i' option for process address space
encounters below errors reported.
$ sudo ./page-types -p $(pidof bash) -i
mark page idle: Bad file descriptor
mark page idle: Bad file descriptor
mark page idle: Bad file descriptor
mark page idle: Bad file descriptor
...
Fix the following build failure reported in [1] by adding a conditional
definition of EM_RISCV in order to allow cross-compilation on machines
which do not have EM_RISCV definition in their host.
scripts/sorttable.c:352:7: error: use of undeclared identifier 'EM_RISCV'
EM_RISCV was added to <elf.h> in glibc 2.24 so builds on systems with
glibc headers < 2.24 should show this error.
[mkubecek@suse.cz: changelog addition] Link: https://lore.kernel.org/lkml/e8965b25-f15b-c7b4-748c-d207dda9c8e8@i2se.com/ Link: https://lkml.kernel.org/r/20210913030625.4525-1-miles.chen@mediatek.com Fixes: 54fed35fd393 ("riscv: Enable BUILDTIME_TABLE_SORT") Signed-off-by: Miles Chen <miles.chen@mediatek.com> Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Cc: Michal Kubecek <mkubecek@suse.cz> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Markus Mayer <mmayer@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wengang Wang [Fri, 24 Sep 2021 22:43:35 +0000 (15:43 -0700)]
ocfs2: drop acl cache for directories too
ocfs2_data_convert_worker() is currently dropping any cached acl info
for FILE before down-converting meta lock. It should also drop for
DIRECTORY. Otherwise the second acl lookup returns the cached one (from
VFS layer) which could be already stale.
The problem we are seeing is that the acl changes on one node doesn't
get refreshed on other nodes in the following case:
setfacl -m u:user1:rwX dir1
getfacl dir1 <-- see the change for user1
getfacl dir1 <-- can't see change for user1
Link: https://lkml.kernel.org/r/20210903012631.6099-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Liu Yuntao [Fri, 24 Sep 2021 22:43:32 +0000 (15:43 -0700)]
mm/shmem.c: fix judgment error in shmem_is_huge()
In the case of SHMEM_HUGE_WITHIN_SIZE, the page index is not rounded up
correctly. When the page index points to the first page in a huge page,
round_up() cannot bring it to the end of the huge page, but to the end
of the previous one.
An example:
HPAGE_PMD_NR on my machine is 512(2 MB huge page size). After
allcoating a 3000 KB buffer, I access it at location 2050 KB. In
shmem_is_huge(), the corresponding index happens to be 512. After
rounded up by HPAGE_PMD_NR, it will still be 512 which is smaller than
i_size, and shmem_is_huge() will return true. As a result, my buffer
takes an additional huge page, and that shouldn't happen when
shmem_enabled is set to within_size.
Link: https://lkml.kernel.org/r/20210909032007.18353-1-liuyuntao10@huawei.com Fixes: f3f0e1d2150b2b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Liu Yuntao <liuyuntao10@huawei.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: wuxu.wu <wuxu.wu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xtensa frame size is larger than the frame size for almost all other
architectures. This results in more than 50 "the frame size of <n> is
larger than 1024 bytes" errors when trying to build xtensa:allmodconfig.
Increase frame size for xtensa to 1536 bytes to avoid compile errors due
to frame size limits.
Link: https://lkml.kernel.org/r/20210912025235.3514761-1-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marco Elver [Fri, 24 Sep 2021 22:43:23 +0000 (15:43 -0700)]
kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
In the main KASAN config option CC_HAS_WORKING_NOSANITIZE_ADDRESS is
checked for instrumentation-based modes. However, if
HAVE_ARCH_KASAN_HW_TAGS is true all modes may still be selected.
To fix, also make the software modes depend on
CC_HAS_WORKING_NOSANITIZE_ADDRESS.
mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
Commit fcc00621d88b ("mm/hwpoison: retry with shake_page() for
unhandlable pages") changed the return value of __get_hwpoison_page() to
retry for transiently unhandlable cases. However, __get_hwpoison_page()
currently fails to properly judge buddy pages as handlable, so hard/soft
offline for buddy pages always fail as "unhandlable page". This is
totally regrettable.
So let's add is_free_buddy_page() in HWPoisonHandlable(), so that
__get_hwpoison_page() returns different return values between buddy
pages and unhandlable pages as intended.
Link: https://lkml.kernel.org/r/20210909004131.163221-1-naoya.horiguchi@linux.dev Fixes: fcc00621d88b ("mm/hwpoison: retry with shake_page() for unhandlable pages") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Begunkov [Fri, 24 Sep 2021 19:04:29 +0000 (20:04 +0100)]
io_uring: make OP_CLOSE consistent with direct open
From recently open/accept are now able to manipulate fixed file table,
but it's inconsistent that close can't. Close the gap, keep API same as
with open/accept, i.e. via sqe->file_slot.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge tag 'gpio-fixes-for-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a regression in GPIO ACPI on HP ElitePad 1000 G2 where the
gpio_set_debounce_timeout() now returns a fatal error if the specific
debounce period is not supported by the driver instead of just
emitting a warning
- fix return values of irq_mask/unmask() callbacks in gpio-uniphier
- fix hwirq calculation in gpio-aspeed-sgpio
- fix two issues in gpio-rockchip: only make the extended debounce
support available for v2 and remove a redundant BIT() usage
* tag 'gpio-fixes-for-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio/rockchip: fix get_direction value handling
gpio/rockchip: extended debounce support is only available on v2
gpio: gpio-aspeed-sgpio: Fix wrong hwirq in irq handler.
gpio: uniphier: Fix void functions to remove return value
gpiolib: acpi: Make set-debounce-timeout failures non fatal
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- It turns out that the optimised string routines merged in 5.14 are
not safe with in-kernel MTE (KASAN_HW_TAGS) because of reading beyond
the end of a string (strcmp, strncmp). Such reading may go across a
16 byte tag granule and cause a tag check fault. When KASAN_HW_TAGS
is enabled, use the generic strcmp/strncmp C implementation.
- An errata workaround for ThunderX relied on the CPU capabilities
being enabled in a specific order. This disappeared with the
automatic generation of the cpucaps.h file (sorted alphabetically).
Fix it by checking the current CPU only rather than the system-wide
capability.
- Add system_supports_mte() checks on the kernel entry/exit path and
thread switching to avoid unnecessary barriers and function calls on
systems where MTE is not supported.
- kselftests: skip arm64 tests if the required features are missing.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Restore forced disabling of KPTI on ThunderX
kselftest/arm64: signal: Skip tests if required features are missing
arm64: Mitigate MTE issues with str{n}cmp()
arm64: add MTE supported check to thread switching and syscall entry/exit
Merge tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc filesystem fixes from Jan Kara:
"A for ext2 sleep in atomic context in case of some fs problems and a
cleanup of an invalidate_lock initialization"
* tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: fix sleeping in atomic bugs on error
mm: Fully initialize invalidate_lock, amend lock class later
Ming Lei [Thu, 23 Sep 2021 02:37:51 +0000 (10:37 +0800)]
block: hold ->invalidate_lock in blkdev_fallocate
When running ->fallocate(), blkdev_fallocate() should hold
mapping->invalidate_lock to prevent page cache from being accessed,
otherwise stale data may be read in page cache.
Without this patch, blktests block/009 fails sometimes. With this patch,
block/009 can pass always.
Also as Jan pointed out, no pages can be created in the discarded area
while you are holding the invalidate_lock, so remove the 2nd
truncate_bdev_range().
ioctl$SG_IO(/dev/sda, SG_IO, ...)
// Enters trace_note_tsk() after blk_trace_free() returned
// Use mdelay in rcu region rather than msleep(which may schedule out)
Remove blk_trace from running_list before calling blk_trace_free() by
sysfs if blk_trace is at Blktrace_running state.
Ming Lei [Fri, 24 Sep 2021 11:07:04 +0000 (19:07 +0800)]
block: don't call rq_qos_ops->done_bio if the bio isn't tracked
rq_qos framework is only applied on request based driver, so:
1) rq_qos_done_bio() needn't to be called for bio based driver
2) rq_qos_done_bio() needn't to be called for bio which isn't tracked,
such as bios ended from error handling code.
Especially in bio_endio():
1) request queue is referred via bio->bi_bdev->bd_disk->queue, which
may be gone since request queue refcount may not be held in above two
cases
2) q->rq_qos may be freed in blk_cleanup_queue() when calling into
__rq_qos_done_bio()
Fix the potential kernel panic by not calling rq_qos_ops->done_bio if
the bio isn't tracked. This way is safe because both ioc_rqos_done_bio()
and blkcg_iolatency_done_bio() are nop if the bio isn't tracked.
Reported-by: Yu Kuai <yukuai3@huawei.com> Cc: tj@kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Fri, 24 Sep 2021 16:14:48 +0000 (17:14 +0100)]
io_uring: kill extra checks in io_write()
We don't retry short writes and so we would never get to async setup in
io_write() in that case. Thus ret2 > 0 is always false and
iov_iter_advance() is never used. Apparently, the same is found by
Coverity, which complains on the code.
io_uring: put provided buffer meta data under memcg accounting
For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.
io_uring: allow conditional reschedule for intensive iterators
If we have a lot of threads and rings, the tctx list can get quite big.
This is especially true if we keep creating new threads and rings.
Likewise for the provided buffers list. Be nice and insert a conditional
reschedule point while iterating the nodes for deletion.
iowq original context
io_poll_add
_arm_poll()
mask = vfs_poll() is not 0
if mask
(2) io_poll_complete()
compl_unlock
(interruption happens
tw queued to original
context)
io_poll_task_func()
compl_lock
(3) done = io_poll_complete() is true
compl_unlock
put req ref
(1) if (poll->flags & EPOLLONESHOT)
put req ref
EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple
combinations that can cause ref underfow.
Let's address it by:
- check the return value in (2) as done
- change (1) to if (done)
in this way, we only do ref put in (1) if 'oneshot flag' is from
(2)
- do poll.done check in io_poll_task_func(), so that we won't put ref
for the second time.
io_uring: fix race between poll completion and cancel_hash insertion
If poll arming and poll completion runs in parallel, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:
iowq original context
io_poll_add
vfs_poll
(interruption happens
tw queued to original
context) io_poll_task_func
generate cqe
del from cancel_hash[]
if !poll.done
insert to cancel_hash[]
The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].
Dave reports that a coredumping workload gets stuck in 5.15-rc2, and
identified the culprit in the Fixes line below. The problem is that
relying solely on fatal_signal_pending() to gate whether to exit or not
fails miserably if a process gets eg SIGILL sent. Don't exclusively
rely on fatal signals, also check if the thread group is exiting.
Fixes: 15e20db2e0ce ("io-wq: only exit on fatal signals") Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme into block-5.15
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.15:
- keep ctrl->namespaces ordered (me)
- fix incorrect h2cdata pdu offset accounting in nvme-tcp
(Sagi Grimberg)
- handled updated hw_queues in nvme-fc more carefully (Daniel Wagner,
James Smart)"
* tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme:
nvme: keep ctrl->namespaces ordered
nvme-tcp: fix incorrect h2cdata pdu offset accounting
nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
nvme-fc: avoid race between time out and tear down
nvme-fc: update hardware queues before using them
x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
Don't perform unaligned loads in __get_next() and __peek_nbyte_next() as
these are forms of undefined behavior:
"A pointer to an object or incomplete type may be converted to a pointer
to a different object or incomplete type. If the resulting pointer
is not correctly aligned for the pointed-to type, the behavior is
undefined."
Merge tag 'usb-serial-5.15-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.15-rc3
Here's a fix for a regression affecting some CP2102 devices and a host
of new device ids.
Included are also a couple of cleanups of duplicate device ids, which
are also tagged for stable to keep the tables in sync, and a trivial
patch to help debugging cp210x issues.
All have been in linux-next with no reported issues. Note however that
the last last two device-id commits were rebased to fix up a lore link
in a commit message (as the patch itself never made it to the list).
* tag 'usb-serial-5.15-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add device id for Foxconn T99W265
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
USB: serial: cp210x: add part-number debug printk
USB: serial: cp210x: fix dropped characters with CP2102
USB: serial: option: remove duplicate USB device ID
USB: serial: mos7840: remove duplicated 0xac24 device ID
USB: serial: option: add Telit LN920 compositions
Steve French [Fri, 24 Sep 2021 00:18:37 +0000 (19:18 -0500)]
cifs: fix incorrect check for null pointer in header_assemble
Although very unlikely that the tlink pointer would be null in this case,
get_next_mid function can in theory return null (but not an error)
so need to check for null (not for IS_ERR, which can not be returned
here).
Address warning:
fs/smbfs_client/connect.c:2392 cifs_match_super()
warn: 'tlink' isn't an ERR_PTR
Pointed out by Dan Carpenter via smatch code analysis tool
CC: stable@vger.kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Thu, 23 Sep 2021 23:52:40 +0000 (18:52 -0500)]
smb3: correct server pointer dereferencing check to be more consistent
Address warning:
fs/smbfs_client/misc.c:273 header_assemble()
warn: variable dereferenced before check 'treeCon->ses->server'
Pointed out by Dan Carpenter via smatch code analysis tool
Although the check is likely unneeded, adding it makes the code
more consistent and easier to read, as the same check is
done elsewhere in the function.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Merge tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix for leak of transaction handle after verity rollback
failure
- properly reset device last error between mounts
- improve one error handling case when checksumming bios
- fixup confusing displayed size of space info free space
* tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prevent __btrfs_dump_space_info() to underflow its free space
btrfs: fix mount failure due to past and transient device flush error
btrfs: fix transaction handle leak after verity rollback failure
btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
Steve French [Thu, 23 Sep 2021 21:00:31 +0000 (16:00 -0500)]
smb3: correct smb3 ACL security descriptor
Address warning:
fs/smbfs_client/smb2pdu.c:2425 create_sd_buf()
warn: struct type mismatch 'smb3_acl vs cifs_acl'
Pointed out by Dan Carpenter via smatch code analysis tool
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Merge tag 'selinux-pr-20210923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux/Smack fixes from Paul Moore:
"Another single-patch pull request for SELinux, as well as Smack.
This fixes some credential misuse and is explained reasonably well in
the patch description"
* tag 'selinux-pr-20210923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux,smack: fix subjective/objective credential use mixups
Philip Yang [Mon, 20 Sep 2021 21:25:52 +0000 (17:25 -0400)]
drm/amdkfd: fix svm_migrate_fini warning
Device manager releases device-specific resources when a driver
disconnects from a device, devm_memunmap_pages and
devm_release_mem_region calls in svm_migrate_fini are redundant.
It causes below warning trace after patch "drm/amdgpu: Split
amdgpu_device_fini into early and late", so remove function
svm_migrate_fini.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 17 Sep 2021 18:32:14 +0000 (14:32 -0400)]
drm/amdkfd: handle svm migrate init error
If svm migration init failed to create pgmap for device memory, set
pgmap type to 0 to disable device SVM support capability.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/pm: Update intermediate power state for SI
Update the current state as boot state during dpm initialization.
During the subsequent initialization, set_power_state gets called to
transition to the final power state. set_power_state refers to values
from the current state and without current state populated, it could
result in NULL pointer dereference.
For ex: on platforms where PCI speed change is supported through ACPI
ATCS method, the link speed of current state needs to be queried before
deciding on changing to final power state's link speed. The logic to query
ATCS-support was broken on certain platforms. The issue became visible
when broken ATCS-support logic got fixed with commit f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)").
Philip Yang [Tue, 14 Sep 2021 20:33:40 +0000 (16:33 -0400)]
drm/amdkfd: fix dma mapping leaking warning
For xnack off, restore work dma unmap previous system memory page, and
dma map the updated system memory page to update GPU mapping, this is
not dma mapping leaking, remove the WARN_ONCE for dma mapping leaking.
prange->dma_addr store the VRAM page pfn after the range migrated to
VRAM, should not dma unmap VRAM page when updating GPU mapping or
remove prange. Add helper svm_is_valid_dma_mapping_addr to check VRAM
page and error cases.
Mask out SVM_RANGE_VRAM_DOMAIN flag in dma_addr before calling amdgpu vm
update to avoid BUG_ON(*addr & 0xFFFF00000000003FULL), and set it again
immediately after. This flag is used to know the type of page later to
dma unmapping system memory page.
Fixes: 1d5dbfe6c06a ("drm/amdkfd: classify and map mixed svm range pages in GPU") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 13 Sep 2021 14:03:36 +0000 (10:03 -0400)]
drm/amdkfd: SVM map to gpus check vma boundary
SVM range may includes multiple VMAs with different vm_flags, if prange
page index is the last page of the VMA offset + npages, update GPU
mapping to create GPU page table with same VMA access permission.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Using an empty macro expansion as a conditional expression
produces a W=1 warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c: In function 'dce_aux_transfer_with_retries':
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:775:156: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
775 | "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER");
| ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:783:155: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
783 | "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_NACK");
| ^
Expand it to "do { } while (0)" instead to make the expression
more robust and avoid the warning.
Fixes: 56aca2309301 ("drm/amd/display: Add AUX I2C tracing.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The memory semantics added in commit 437b38c51162 causes SystemMemory
Operation region, whose address range is not described in the EFI memory
map to be mapped as NormalNC memory on arm64 platforms (through
acpi_os_map_memory() in acpi_ex_system_memory_space_handler()).
This triggers the following abort on an ARM64 Ampere eMAG machine,
because presumably the physical address range area backing the Opregion
does not support NormalNC memory attributes driven on the bus.
If the Opregion address range is not present in the EFI memory map there
is no way for us to determine the memory attributes to use to map it -
defaulting to NormalNC does not work (and it is not correct on a memory
region that may have read side-effects) and therefore commit 437b38c51162 should be reverted, which means reverting back to the
original behavior whereby address ranges that are mapped using
acpi_os_map_memory() default to the safe devicenGnRnE attributes on
ARM64 if the mapped address range is not defined in the EFI memory map.
Fixes: 437b38c51162 ("ACPI: Add memory semantics to acpi_os_map_memory()") Signed-off-by: Jia He <justin.he@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull rseq fixes from Paolo Bonzini:
"A fix for a bug with restartable sequences and KVM.
KVM's handling of TIF_NOTIFY_RESUME, e.g. for task migration, clears
the flag without informing rseq and leads to stale data in userspace's
rseq struct"
* tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Remove __NR_userfaultfd syscall fallback
KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
tools: Move x86 syscall number fallbacks to .../uapi/
entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest