Linus Torvalds [Fri, 27 Nov 2020 19:09:13 +0000 (11:09 -0800)]
Merge tag 'platform-drivers-x86-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- thinkpad_acpi fixes: two bug-fixes and three model specific quirks
- fixes for misc other drivers: two bug-fixes and three model specific
quirks
* tag 'platform-drivers-x86-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add info for the Irbis TW118 tablet
platform/x86: touchscreen_dmi: Add info for the Predia Basic tablet
platform/x86: intel-vbtn: Support for tablet mode on HP Pavilion 13 x360 PC
platform/x86: toshiba_acpi: Fix the wrong variable assignment
platform/x86: acer-wmi: add automatic keyboard background light toggle key as KEY_LIGHTS_TOGGLE
platform/x86: thinkpad_acpi: Whitelist P15 firmware for dual fan control
platform/x86: thinkpad_acpi: Send tablet mode switch at wakeup time
platform/x86: thinkpad_acpi: Add BAT1 is primary battery quirk for Thinkpad Yoga 11e 4th gen
platform/x86: thinkpad_acpi: Do not report SW_TABLET_MODE on Yoga 11e
platform/x86: thinkpad_acpi: add P1 gen3 second fan support
Linus Torvalds [Fri, 27 Nov 2020 19:04:13 +0000 (11:04 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspace
S390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor call
x86:
- fix for SEV debugging
- fix incorrect return code
- fix for 'noapic' with PIC in userspace and LAPIC in kernel
- fix for 5-level paging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
KVM: x86: Fix split-irqchip vs interrupt injection window request
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
MAINTAINERS: Update email address for Sean Christopherson
MAINTAINERS: add uv.c also to KVM/s390
s390/uv: handle destroy page legacy interface
KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
KVM: SVM: fix error return code in svm_create_vcpu()
KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
KVM: arm64: Correctly align nVHE percpu data
KVM: s390: remove diag318 reset code
KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup
Linus Torvalds [Fri, 27 Nov 2020 18:59:02 +0000 (10:59 -0800)]
Merge tag 'powerpc-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.10:
- regression fix for a boot failure on some 32-bit machines.
- fix for host crashes in the KVM system reset handling.
- fix for a possible oops in the KVM XIVE interrupt handling on
Power9.
- fix for host crashes triggerable via the KVM emulated MMIO handling
when running HPT guests.
- a couple of small build fixes.
Thanks to Andreas Schwab, Cédric Le Goater, Christophe Leroy, Erhard
Furtner, Greg Kurz, Greg Kurz, Németh Márton, Nicholas Piggin, Nick
Desaulniers, Serge Belyshev, and Stephen Rothwell"
* tag 'powerpc-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix allnoconfig build since uaccess flush
powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context
powerpc: Drop -me200 addition to build flags
KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page
powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y
powerpc/32s: Use relocation offset when setting early hash table
Linus Torvalds [Fri, 27 Nov 2020 18:44:59 +0000 (10:44 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main changes are relating to our handling of access/dirty bits,
where our low-level page-table helpers could lead to stale young
mappings and loss of the dirty bit in some cases (the latter has not
been observed in practice, but could happen when clearing "soft-dirty"
if we enabled that). These were posted as part of a larger series, but
the rest of that is less urgent and needs a v2 which I'll get to
shortly.
In other news, we've now got a set of fixes to resolve the
lockdep/tracing problems that have been plaguing us for a while, but
they're still a bit "fresh" and I plan to send them to you next week
after we've got some more confidence in them (although initial CI
results look good).
Summary:
- Fix kerneldoc warnings generated by ACPI IORT code
- Fix pte_accessible() so that access flag is ignored
- Fix missing header #include
- Fix loss of software dirty bit across pte_wrprotect() when HW DBM
is enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
arm64: pgtable: Fix pte_accessible()
ACPI/IORT: Fix doc warnings in iort.c
arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build
Linus Torvalds [Fri, 27 Nov 2020 18:41:19 +0000 (10:41 -0800)]
Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull iommu fixes from Will Deacon:
"Here's another round of IOMMU fixes for -rc6 consisting mainly of a
bunch of independent driver fixes. Thomas agreed for me to take the
x86 'tboot' fix here, as it fixes a regression introduced by a vt-d
change.
- Fix intel iommu driver when running on devices without VCCAP_REG
- Fix swiotlb and "iommu=pt" interaction under TXT (tboot)
- Fix missing return value check during device probe()
- Fix probe ordering for Qualcomm SMMU implementation
- Ensure page-sized mappings are used for AMD IOMMU buffers with SNP
RMP"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/vt-d: Don't read VCCAP register unless it exists
x86/tboot: Don't disable swiotlb when iommu is forced on
iommu: Check return of __iommu_attach_device()
arm-smmu-qcom: Ensure the qcom_scm driver has finished probing
iommu/amd: Enforce 4k mapping for certain IOMMU data structures
Linus Torvalds [Fri, 27 Nov 2020 18:38:36 +0000 (10:38 -0800)]
Merge tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
- do not lose trailing newline in pr_cont() calls
- two trivial fixes for a dead store and a config description
* tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: finalize records with trailing newlines
printk: remove unneeded dead-store assignment
init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description
The guest triggering this crashes. Note, this happens with the traditional
MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
there was a subtle change in the above mentioned commit. Previously,
walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
which is returned by shadow_walk_init() and this equals to
'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
'int root = vcpu->arch.mmu->root_level'.
The difference between 'root_level' and 'shadow_root_level' on CPUs
supporting 5-level page tables is that in some case we don't want to
use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48'
kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used,
the corresponding SPTE will fail '__is_rsvd_bits_set()' check.
Revert to using 'shadow_root_level'.
Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 27 Nov 2020 08:18:20 +0000 (09:18 +0100)]
KVM: x86: Fix split-irqchip vs interrupt injection window request
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
a hodge-podge of conditions, hacked together to get something that
more or less works. But what is actually needed is much simpler;
in both cases the fundamental question is, do we have a place to stash
an interrupt if userspace does KVM_INTERRUPT?
In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
unnecessarily restrictive.
In split irqchip mode it's a bit more complicated, we need to check
kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
cycle and thus requires ExtINTs not to be masked) as well as
!pending_userspace_extint(vcpu). However, there is no need to
check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
pending ExtINT state separate from event injection state, and checking
kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
priority than APIC interrupts. In fact the latter fixes a bug:
when userspace requests an IRQ window vmexit, an interrupt in the
local APIC can cause kvm_cpu_has_interrupt() to be true and thus
kvm_vcpu_ready_for_interrupt_injection() to return false. When this
happens, vcpu_run does not exit to userspace but the interrupt window
vmexits keep occurring. The VM loops without any hope of making progress.
we realize two things. First, thanks to the previous patch the complex
conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt
window request in vcpu_enter_guest()
should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
it is unnecessary to ask the processor for an interrupt window
if we would not be able to return to userspace. Therefore,
kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
ANDed with the existing check for masked ExtINT. It all makes sense:
- we can accept an interrupt from userspace if there is a place
to stash it (and, for irqchip split, ExtINTs are not masked).
Interrupts from userspace _can_ be accepted even if right now
EFLAGS.IF=0.
- in order to tell userspace we will inject its interrupt ("IRQ
window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
KVM and the vCPU need to be ready to accept the interrupt.
... and this is what the patch implements.
Reported-by: David Woodhouse <dwmw@amazon.co.uk> Analyzed-by: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Paolo Bonzini [Fri, 27 Nov 2020 07:53:52 +0000 (08:53 +0100)]
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
Centralize handling of interrupts from the userspace APIC
in kvm_cpu_has_extint and kvm_cpu_get_extint, since
userspace APIC interrupts are handled more or less the
same as ExtINTs are with split irqchip. This removes
duplicated code from kvm_cpu_has_injectable_intr and
kvm_cpu_has_interrupt, and makes the code more similar
between kvm_cpu_has_{extint,interrupt} on one side
and kvm_cpu_get_{extint,interrupt} on the other.
Cc: stable@vger.kernel.org Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
John Ogness [Thu, 26 Nov 2020 11:48:36 +0000 (12:54 +0106)]
printk: finalize records with trailing newlines
Any record with a trailing newline (LOG_NEWLINE flag) cannot
be continued because the newline has been stripped and will
not be visible if the message is appended. This was already
handled correctly when committing in log_output() but was
not handled correctly when committing in log_store().
Fixes: f5f022e53b87 ("printk: reimplement log_cont using record extension") Link: https://lore.kernel.org/r/20201126114836.14750-1-john.ogness@linutronix.de Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
Feng Tang [Wed, 25 Nov 2020 05:22:21 +0000 (13:22 +0800)]
mm: memcg: relayout structure mem_cgroup to avoid cache interference
0day reported one -22.7% regression for will-it-scale page_fault2
case [1] on a 4 sockets 144 CPU platform, and bisected to it to be
caused by Waiman's optimization (commit bd0b230fe1) of saving one
'struct page_counter' space for 'struct mem_cgroup'.
Initially we thought it was due to the cache alignment change introduced
by the patch, but further debug shows that it is due to some hot data
members ('vmstats_local', 'vmstats_percpu', 'vmstats') sit in 2 adjacent
cacheline (2N and 2N+1 cacheline), and when adjacent cache line prefetch
is enabled, it triggers an "extended level" of cache false sharing for
2 adjacent cache lines.
So exchange the 2 member blocks, while keeping mostly the original
cache alignment, which can restore and even enhance the performance,
and save 64 bytes of space for 'struct mem_cgroup' (from 2880 to 2816,
with 0day's default RHEL-8.3 kernel config)
David Woodhouse [Thu, 26 Nov 2020 11:13:51 +0000 (11:13 +0000)]
iommu/vt-d: Don't read VCCAP register unless it exists
My virtual IOMMU implementation is whining that the guest is reading a
register that doesn't exist. Only read the VCCAP_REG if the corresponding
capability is set in ECAP_REG to indicate that it actually exists.
Fixes: 3375303e8287 ("iommu/vt-d: Add custom allocator for IOASID") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Cc: stable@vger.kernel.org # v5.7+ Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/de32b150ffaa752e0cff8571b17dfb1213fbe71c.camel@infradead.org Signed-off-by: Will Deacon <will@kernel.org>
Max Verevkin [Tue, 24 Nov 2020 13:16:52 +0000 (15:16 +0200)]
platform/x86: intel-vbtn: Support for tablet mode on HP Pavilion 13 x360 PC
The Pavilion 13 x360 PC has a chassis-type which does not indicate it is
a convertible, while it is actually a convertible. Add it to the
dmi_switches_allow_list.
Kaixu Xia [Sun, 22 Nov 2020 05:49:37 +0000 (13:49 +0800)]
platform/x86: toshiba_acpi: Fix the wrong variable assignment
The commit 78429e55e4057 ("platform/x86: toshiba_acpi: Clean up
variable declaration") cleans up variable declaration in
video_proc_write(). Seems it does the variable assignment in the
wrong place, this results in dead code and changes the source code
logic. Fix it by doing the assignment at the beginning of the funciton.
Fixes: 78429e55e4057 ("platform/x86: toshiba_acpi: Clean up variable declaration") Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Link: https://lore.kernel.org/r/1606024177-16481-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Benjamin Berg [Mon, 23 Nov 2020 13:21:57 +0000 (14:21 +0100)]
platform/x86: thinkpad_acpi: Send tablet mode switch at wakeup time
The lid state may change while the machine is suspended. As such, we may
need to re-check the state at wake-up time (at least when waking up from
hibernation).
Add the appropriate call to the resume handler in order to sync the
SW_TABLET_MODE switch state with the hardware state.
Hans de Goede [Mon, 9 Nov 2020 10:35:50 +0000 (11:35 +0100)]
platform/x86: thinkpad_acpi: Add BAT1 is primary battery quirk for Thinkpad Yoga 11e 4th gen
The Thinkpad Yoga 11e 4th gen with the N3450 / Celeron CPU only has
one battery which is named BAT1 instead of the expected BAT0, add a
quirk for this. This fixes not being able to set the charging tresholds
on this model; and this alsoe fixes the following errors in dmesg:
ACPI: \_SB_.PCI0.LPCB.EC__.HKEY: BCTG evaluated but flagged as error
thinkpad_acpi: Error probing battery 2
battery: extension failed to load: ThinkPad Battery Extension
battery: extension unregistered: ThinkPad Battery Extension
Note that the added quirk is for the "R0K" BIOS versions which are
used on the Thinkpad Yoga 11e 4th gen's with a Celeron CPU, there
is a separate "R0L" BIOS for the i3/i5 based versions. This may also
need the same quirk, but if that really is necessary is unknown.
Hans de Goede [Fri, 6 Nov 2020 14:01:30 +0000 (15:01 +0100)]
platform/x86: thinkpad_acpi: Do not report SW_TABLET_MODE on Yoga 11e
The Yoga 11e series has 2 accelerometers described by a BOSC0200 ACPI node.
This setup relies on a Windows service which reads both accelerometers and
then calculates the angle between the 2 halves to determine laptop / tent /
tablet mode and then reports the calculated mode back to the EC by calling
special ACPI methods on the BOSC0200 node.
The bmc150 iio driver does not support this (it involves double
calculations requiring sqrt and arccos so this really needs to be done
in userspace), as a result of this on the Yoga 11e the thinkpad_acpi
code always reports SW_TABLET_MODE=0, starting with GNOME 3.38 reporting
SW_TABLET_MODE=0 causes GNOME to:
1. Not show the onscreen keyboard when a text-input field is focussed
with the touchscreen.
2. Disable accelerometer based auto display-rotation.
This makes sense when in laptop-mode but not when in tablet-mode. But
since for the Yoga 11e the thinkpad_acpi code always reports
SW_TABLET_MODE=0, GNOME does not know when the device is in tablet-mode.
Stop reporting the broken (always 0) SW_TABLET_MODE on Yoga 11e models
to fix this.
Note there are plans for userspace to support 360 degree hinges style
2-in-1s with 2 accelerometers and figure out the mode by itself, see:
https://gitlab.freedesktop.org/hadess/iio-sensor-proxy/-/issues/216
platform/x86: thinkpad_acpi: add P1 gen3 second fan support
Tested on my P1 gen3, works fine with `thinkfan`. Since thinkpad_acpi fan
control is off by default, it is safe to add 2nd fan control for brave
overclockers
Linus Torvalds [Wed, 25 Nov 2020 18:35:44 +0000 (10:35 -0800)]
Merge tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a rand Kconfig fixup for mtk-vcodec
- a fix at h264 handling at cedrus codec driver
- some warning fixes when config PM is not enabled at marvell-ccic
- two fixes at venus codec driver: one related to codec profile and the
other one related to a bad error path which causes an OOPS on module
re-bind
* tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: venus: pm_helpers: Fix kernel module reload
media: venus: venc: Fix setting of profile and level
media: cedrus: h264: Fix check for presence of scaling matrix
media: media/platform/marvell-ccic: fix warnings when CONFIG_PM is not enabled
media: mtk-vcodec: fix build breakage when one of VPU or SCP is enabled
media: mtk-vcodec: move firmware implementations into their own files
Lu Baolu [Wed, 25 Nov 2020 01:41:24 +0000 (09:41 +0800)]
x86/tboot: Don't disable swiotlb when iommu is forced on
After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA
domain"), swiotlb could also be used for direct memory access if IOMMU
is enabled but a device is configured to pass through the DMA translation.
Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work
if "iommu=pt" kernel parameter is used.
Fixes: 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA domain") Reported-and-tested-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201125014124.4070776-1-baolu.lu@linux.intel.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210237 Signed-off-by: Will Deacon <will@kernel.org>
Linus Torvalds [Tue, 24 Nov 2020 23:33:18 +0000 (15:33 -0800)]
Merge tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four smb3 fixes for stable: one fixes a memleak, the other three
address a problem found with decryption offload that can cause a use
after free"
* tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: Handle error case during offload read path
smb3: Avoid Mid pending list corruption
smb3: Call cifs reconnect from demultiplex thread
cifs: fix a memleak with modefromsid
Hugh Dickins [Tue, 24 Nov 2020 16:46:43 +0000 (08:46 -0800)]
mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.
The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.
https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
Then on crashing a second time, realized there's a stronger reason against
that approach. If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()? And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon? What
would that look like?
It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).
Matthew Wilcox explaining this to himself:
"page is allocated, added to page cache, dirtied, writeback starts,
--- thread A ---
filesystem calls end_page_writeback()
test_clear_page_writeback()
--- context switch to thread B ---
truncate_inode_pages_range() finds the page, it doesn't have writeback set,
we delete it from the page cache. Page gets reallocated, dirtied, writeback
starts again. Then we call write_cache_pages(), see
PageWriteback() set, call wait_on_page_writeback()
--- context switch back to thread A ---
wake_up_page(page, PG_writeback);
... thread B is woken, but because the wakeup was for the old use of
the page, PageWriteback is still set.
Devious"
And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.
I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().
Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared? I think not,
because each waiter does hold a reference on the page. This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it). The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
Linus Torvalds [Tue, 24 Nov 2020 20:12:55 +0000 (12:12 -0800)]
Merge tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"A couple more stack unwinder related fixes:
- More stack unwinding updates
- Misc minor fixes"
* tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: stack unwinding: reorganize how initial register state setup
ARC: stack unwinding: don't assume non-current task is sleeping
ARC: mm: fix spelling mistakes
ARC: bitops: Remove unecessary operation and value
Will Deacon [Fri, 20 Nov 2020 13:57:48 +0000 (13:57 +0000)]
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.
Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.
Cc: <stable@vger.kernel.org> Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Fri, 20 Nov 2020 13:28:01 +0000 (13:28 +0000)]
arm64: pgtable: Fix pte_accessible()
pte_accessible() is used by ptep_clear_flush() to figure out whether TLB
invalidation is necessary when unmapping pages for reclaim. Although our
implementation is correct according to the architecture, returning true
only for valid, young ptes in the absence of racing page-table
modifications, this is in fact flawed due to lazy invalidation of old
ptes in ptep_clear_flush_young() where we elide the expensive DSB
instruction for completing the TLB invalidation.
Rather than penalise the aging path, adjust pte_accessible() to return
true for any valid pte, even if the access flag is cleared.
Shameer Kolothum [Thu, 19 Nov 2020 16:58:46 +0000 (16:58 +0000)]
iommu: Check return of __iommu_attach_device()
Currently iommu_create_device_direct_mappings() is called
without checking the return of __iommu_attach_device(). This
may result in failures in iommu driver if dev attach returns
error.
John Stultz [Thu, 12 Nov 2020 22:05:19 +0000 (22:05 +0000)]
arm-smmu-qcom: Ensure the qcom_scm driver has finished probing
Robin Murphy pointed out that if the arm-smmu driver probes before
the qcom_scm driver, we may call qcom_scm_qsmmu500_wait_safe_toggle()
before the __scm is initialized.
Now, getting this to happen is a bit contrived, as in my efforts it
required enabling asynchronous probing for both drivers, moving the
firmware dts node to the end of the dtsi file, as well as forcing a
long delay in the qcom_scm_probe function.
With those tweaks we ran into the following crash:
[ 2.631040] arm-smmu 15000000.iommu: Stage-1: 48-bit VA -> 48-bit IPA
[ 2.633372] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
[ 2.633402] [0000000000000000] user address but active_mm is swapper
[ 2.633409] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 2.633415] Modules linked in:
[ 2.633427] CPU: 5 PID: 117 Comm: kworker/u16:2 Tainted: G W 5.10.0-rc1-mainline-00025-g272a618fc36-dirty #3971
[ 2.633430] Hardware name: Thundercomm Dragonboard 845c (DT)
[ 2.633448] Workqueue: events_unbound async_run_entry_fn
[ 2.633456] pstate: 80c00005 (Nzcv daif +PAN +UAO -TCO BTYPE=--)
[ 2.633465] pc : qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[ 2.633473] lr : qcom_smmu500_reset+0x58/0x78
[ 2.633476] sp : ffffffc0105a3b60
...
[ 2.633567] Call trace:
[ 2.633572] qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[ 2.633576] qcom_smmu500_reset+0x58/0x78
[ 2.633581] arm_smmu_device_reset+0x194/0x270
[ 2.633585] arm_smmu_device_probe+0xc94/0xeb8
[ 2.633592] platform_drv_probe+0x58/0xa8
[ 2.633597] really_probe+0xec/0x398
[ 2.633601] driver_probe_device+0x5c/0xb8
[ 2.633606] __driver_attach_async_helper+0x64/0x88
[ 2.633610] async_run_entry_fn+0x4c/0x118
[ 2.633617] process_one_work+0x20c/0x4b0
[ 2.633621] worker_thread+0x48/0x460
[ 2.633628] kthread+0x14c/0x158
[ 2.633634] ret_from_fork+0x10/0x18
[ 2.633642] Code: a9034fa0d0007f7329107fa091342273 (f9400020)
To avoid this, this patch adds a check on qcom_scm_is_available() in
the qcom_smmu_impl_init() function, returning -EPROBE_DEFER if its
not ready.
This allows the driver to try to probe again later after qcom_scm has
finished probing.
Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Andy Gross <agross@kernel.org> Cc: Maulik Shah <mkshah@codeaurora.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Lina Iyer <ilina@codeaurora.org> Cc: iommu@lists.linux-foundation.org Cc: linux-arm-msm <linux-arm-msm@vger.kernel.org> Link: https://lore.kernel.org/r/20201112220520.48159-1-john.stultz@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
Suravee Suthikulpanit [Thu, 5 Nov 2020 14:58:32 +0000 (14:58 +0000)]
iommu/amd: Enforce 4k mapping for certain IOMMU data structures
AMD IOMMU requires 4k-aligned pages for the event log, the PPR log,
and the completion wait write-back regions. However, when allocating
the pages, they could be part of large mapping (e.g. 2M) page.
This causes #PF due to the SNP RMP hardware enforces the check based
on the page level for these data structures.
So, fix by calling set_memory_4k() on the allocated pages.
Fixes: c69d89aff393 ("iommu/amd: Use 4K page for completion wait write-back semaphore") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: https://lore.kernel.org/r/20201105145832.3065-1-suravee.suthikulpanit@amd.com Signed-off-by: Will Deacon <will@kernel.org>
Shiju Jose [Wed, 14 Oct 2020 09:31:39 +0000 (10:31 +0100)]
ACPI/IORT: Fix doc warnings in iort.c
Fix following warnings caused by mismatch between
function parameters and function comments.
drivers/acpi/arm64/iort.c:55: warning: Function parameter or member 'iort_node' not described in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:55: warning: Excess function parameter 'node' description in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'id' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'bus_token' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Excess function parameter 'req_id' description in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:1142: warning: Function parameter or member 'dma_size' not described in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1142: warning: Excess function parameter 'size' description in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1534: warning: Function parameter or member 'ops' not described in 'iort_add_platform_device'
Randy Dunlap [Mon, 23 Nov 2020 04:45:10 +0000 (20:45 -0800)]
arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build
Adding <asm/exception.h> brought in <asm/kprobes.h> which uses
<asm/probes.h>, which uses 'pstate_check_t' so the latter needs to
#include <asm/insn.h> for this typedef.
Fixes this build error:
In file included from arch/arm64/include/asm/kprobes.h:24,
from arch/arm64/include/asm/exception.h:11,
from arch/arm64/kernel/fpsimd.c:35:
arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t'
16 | pstate_check_t *pstate_cc;
Fixes: c6b90d5cf637 ("arm64/fpsimd: Fix missing-prototypes in fpsimd.c") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/20201123044510.9942-1-rdunlap@infradead.org Signed-off-by: Will Deacon <will@kernel.org>
Sven Schnelle [Fri, 20 Nov 2020 13:17:52 +0000 (14:17 +0100)]
s390: fix fpu restore in entry.S
We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.
The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.
Cc: <stable@vger.kernel.org> # 5.8 Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S") Reported-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Stephen Rothwell [Mon, 23 Nov 2020 07:40:16 +0000 (18:40 +1100)]
powerpc/64s: Fix allnoconfig build since uaccess flush
Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h.
Otherwise the build fails with eg:
arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class
66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Massage change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
Michael Ellerman [Mon, 23 Nov 2020 10:16:27 +0000 (21:16 +1100)]
Merge tag 'powerpc-cve-2020-4788' into fixes
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.
This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.
Sudeep Holla [Fri, 20 Nov 2020 10:12:52 +0000 (10:12 +0000)]
cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK
Commit 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a
dummy clock provider") registers a dummy clock provider using
devm_of_clk_add_hw_provider. These *_hw_provider functions are defined
only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig
dependency, but since we plan to move away from the clock dependency
for scmi cpufreq, it is preferrable to avoid that.
Let us just conditionally compile out the offending call to
devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside
of the #ifdef block to avoid build warning.
Fixes: 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider") Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Linus Torvalds [Sun, 22 Nov 2020 22:36:06 +0000 (14:36 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- Various functionality / regression fixes for Logitech devices from
Hans de Goede
- Fix for (recently added) GPIO support in mcp2221 driver from Lars
Povlsen
- Power management handling fix/quirk in i2c-hid driver for certain
BIOSes that have strange aproach to power-cycle from Hans de Goede
- a few device ID additions and device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver
HID: logitech-dj: Fix an error in mse_bluetooth_descriptor
HID: Add Logitech Dinovo Edge battery quirk
HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge
HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad
HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices
HID: mcp2221: Fix GPIO output handling
HID: hid-sensor-hub: Fix issue with devices with no report ID
HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown
HID: add support for Sega Saturn
HID: cypress: Support Varmilo Keyboards' media hotkeys
HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses
HID: logitech-hidpp: Add PID for MX Anywhere 2
HID: uclogic: Add ID for Trust Flex Design Tablet
Linus Torvalds [Sun, 22 Nov 2020 21:26:07 +0000 (13:26 -0800)]
Merge tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:
- Make the conditional update of the overutilized state work
correctly by caching the relevant flags state before overwriting
them and checking them afterwards.
- Fix a data race in the wakeup path which caused loadavg on ARM64
platforms to become a random number generator.
- Fix the ordering of the iowaiter accounting operations so it can't
be decremented before it is incremented.
- Fix a bug in the deadline scheduler vs. priority inheritance when a
non-deadline task A has inherited the parameters of a deadline task
B and then blocks on a non-deadline task C.
The second inheritance step used the static deadline parameters of
task A, which are usually 0, instead of further propagating task
B's parameters. The zero initialized parameters trigger a bug in
the deadline scheduler"
* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix priority inheritance with multiple scheduling classes
sched: Fix rq->nr_iowait ordering
sched: Fix data-race in wakeup
sched/fair: Fix overutilized update in enqueue_task_fair()
Linus Torvalds [Sun, 22 Nov 2020 21:23:43 +0000 (13:23 -0800)]
Merge tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"A single fix for the x86 perf sysfs interfaces which used kobject
attributes instead of device attributes and therefore making clang's
control flow integrity checker upset"
* tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: fix sysfs type mismatches
Linus Torvalds [Sun, 22 Nov 2020 21:19:53 +0000 (13:19 -0800)]
Merge tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A single fix for lockdep which makes the recursion protection cover
graph lock/unlock"
* tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Put graph lock/unlock under lock_recursion protection
Linus Torvalds [Sun, 22 Nov 2020 21:05:48 +0000 (13:05 -0800)]
Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Borislav Petkov:
"Forwarded EFI fixes from Ard Biesheuvel:
- fix memory leak in efivarfs driver
- fix HYP mode issue in 32-bit ARM version of the EFI stub when built
in Thumb2 mode
- avoid leaking EFI pgd pages on allocation failure"
* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Free efi_pgd with free_pages()
efivarfs: fix memory leak in efivarfs_create()
efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
Linus Torvalds [Sun, 22 Nov 2020 20:55:50 +0000 (12:55 -0800)]
Merge tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
same because the proper one is going through the IOMMU tree (Thomas
Gleixner)
- An Intel microcode loader fix to save the correct microcode patch to
apply during resume (Chen Yu)
- A fix to not access user memory of other processes when dumping
opcode bytes (Thomas Gleixner)
* tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
x86/dumpstack: Do not try to access user space code of other tasks
x86/microcode/intel: Check patch signature before saving microcode for early loading
iommu/vt-d: Take CONFIG_PCI_ATS into account
Linus Torvalds [Sun, 22 Nov 2020 20:14:46 +0000 (12:14 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (madvise, pagemap,
readahead, memcg, userfaultfd), kbuild, and vfs"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix madvise WILLNEED performance problem
libfs: fix error cast of negative value in simple_attr_write()
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
mm: memcg/slab: fix root memcg vmstats
mm: fix readahead_page_batch for retry entries
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
compiler-clang: remove version check for BPF Tracing
mm/madvise: fix memory leak from process_madvise
Linus Torvalds [Sun, 22 Nov 2020 19:58:49 +0000 (11:58 -0800)]
Merge tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO fixes from Greg KH:
"Here are some small Staging and IIO driver fixes for 5.10-rc5. They
include:
- IIO fixes for reported regressions and problems
- new device ids for IIO drivers
- new device id for rtl8723bs driver
- staging ralink driver Kconfig dependency fix
- staging mt7621-pci bus resource fix
All of these have been in linux-next all week with no reported issues"
* tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI
iio: light: fix kconfig dependency bug for VCNL4035
iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK
staging: mt7621-pci: avoid to request pci bus resources
iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
counter/ti-eqep: Fix regmap max_register
iio: adc: stm32-adc: fix a regression when using dma and irq
iio: adc: mediatek: fix unset field
iio: cros_ec: Use default frequencies when EC returns invalid information
Linus Torvalds [Sun, 22 Nov 2020 19:52:10 +0000 (11:52 -0800)]
Merge tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are some small tty/serial fixes for 5.10-rc5 that resolve some
reported issues:
- speakup crash when telling the kernel to use a device that isn't
really there
- imx serial driver fixes for reported problems
- ar933x_uart driver fix for probe error handling path
All have been in linux-next for a while with no reported issues"
* tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: ar933x_uart: disable clk on error handling path in probe
tty: serial: imx: keep console clocks always on
speakup: Do not let the line discipline be used several times
tty: serial: imx: fix potential deadlock
Linus Torvalds [Sun, 22 Nov 2020 19:39:32 +0000 (11:39 -0800)]
Merge tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A final set of miscellaneous bug fixes for ext4"
* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix bogus warning in ext4_update_dx_flag()
jbd2: fix kernel-doc markups
ext4: drop fast_commit from /proc/mounts
David Howells [Sun, 22 Nov 2020 13:13:45 +0000 (13:13 +0000)]
afs: Fix speculative status fetch going out of order wrt to modifications
When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.
To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).
When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.
A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes. What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.
This results in something like the following being seen in dmesg:
showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification. This was causing the cache to be
invalidated for that vnode when it shouldn't have been. If it happens
on a data file, this might lead to local changes being lost.
Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.
Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway. It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.
Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yicong Yang [Sun, 22 Nov 2020 06:17:19 +0000 (22:17 -0800)]
libfs: fix error cast of negative value in simple_attr_write()
The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion. It will lead to the error cast if user inputs a
negative value.
Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value. The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly. Make 'val' unsigned long long as what kstrtoull() takes,
this will eliminate the compile warning on no 64-bit architectures.
Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerald Schaefer [Sun, 22 Nov 2020 06:17:15 +0000 (22:17 -0800)]
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.
In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases. In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock. Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.
For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm. The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.
For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.
Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.
Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
Memory state around the buggy address: 00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ 00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults") Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: <stable@vger.kernel.org> [4.3+] Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Muchun Song [Sun, 22 Nov 2020 06:17:12 +0000 (22:17 -0800)]
mm: memcg/slab: fix root memcg vmstats
If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for
the root memcg. Now this at least affects the vmstat of
NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack size is
smaller than the PAGE_SIZE.
David said:
"I assume that without this fix that the root memcg's vmstat would
always be inflated if we reparented"
Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <guro@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Cc: <stable@vger.kernel.org> [5.3+] Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox (Oracle) [Sun, 22 Nov 2020 06:17:08 +0000 (22:17 -0800)]
mm: fix readahead_page_batch for retry entries
Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for. This was caused by a missing
check in the iterator.
As can be seen in the below panic log, the accessing 0x402 causes a
panic. In the xarray.h, 0x402 means RETRY_ENTRY.
Dan Williams [Sun, 22 Nov 2020 06:17:05 +0000 (22:17 -0800)]
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:
Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.
Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
[dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Desaulniers [Sun, 22 Nov 2020 06:17:01 +0000 (22:17 -0800)]
compiler-clang: remove version check for BPF Tracing
bpftrace parses the kernel headers and uses Clang under the hood.
Remove the version check when __BPF_TRACING__ is defined (as bpftrace
does) so that this tool can continue to parse kernel headers, even with
older clang sources.
Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1") Reported-by: Chen Yu <yu.chen.surf@gmail.com> Reported-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lkml.kernel.org/r/20201104191052.390657-1-ndesaulniers@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 21 Nov 2020 18:36:25 +0000 (10:36 -0800)]
Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"The critical fixes are for a crash that someone reported in the xattr
code on 32-bit arm last week; and a revert of the rmap key comparison
change from last week as it was totally wrong. I need a vacation. :(
Summary:
- Fix various deficiencies in online fsck's metadata checking code
- Fix an integer casting bug in the xattr code on 32-bit systems
- Fix a hang in an inode walk when the inode index is corrupt
- Fix error codes being dropped when initializing per-AG structures
- Fix nowait directio writes that partially succeed but return EAGAIN
- Revert last week's rmap comparison patch because it was wrong"
* tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: revert "xfs: fix rmap key and record comparison functions"
xfs: don't allow NOWAIT DIO across extent boundaries
xfs: return corresponding errcode if xfs_initialize_perag() fail
xfs: ensure inobt record walks always make forward progress
xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
xfs: directory scrub should check the null bestfree entries too
xfs: strengthen rmap record flags checking
xfs: fix the minrecs logic when dealing with inode root child blocks
Linus Torvalds [Sat, 21 Nov 2020 18:33:33 +0000 (10:33 -0800)]
Merge tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"A single fanotify fix from Amir"
* tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix logic of reporting name info with watched parent
Linus Torvalds [Sat, 21 Nov 2020 18:24:05 +0000 (10:24 -0800)]
Merge tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"This gets the seccomp selftests running again on powerpc and sh, and
fixes an audit reporting oversight noticed in both seccomp and ptrace.
- Fix typos in seccomp selftests on powerpc and sh (Kees Cook)
- Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël
Salaün)"
* tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: sh: Fix register names
selftests/seccomp: powerpc: Fix typo in macro variable name
seccomp: Set PF_SUPERPRIV when checking capability
ptrace: Set PF_SUPERPRIV when checking capability
* tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
s390/dasd: fix null pointer dereference for ERP requests
blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats
nvme: fix memory leak freeing command effects
nvme: directly cache command effects log
nvme: free sq/cq dbbuf pointers when dbbuf set fails
block: mark flush request as IDLE when it is really finished
Linus Torvalds [Fri, 20 Nov 2020 19:47:22 +0000 (11:47 -0800)]
Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:
- Disallow async path resolution of /proc/self
- Tighten constraints for segmented async buffered reads
- Fix double completion for a retry error case
- Fix for fixed file life times (Pavel)"
* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
io_uring: order refnode recycling
io_uring: get an active ref_node from files_data
io_uring: don't double complete failed reissue request
mm: never attempt async page lock if we've transferred data already
io_uring: handle -EOPNOTSUPP on path resolution
proc: don't allow async path resolution of /proc/self components
Linus Torvalds [Fri, 20 Nov 2020 18:20:16 +0000 (10:20 -0800)]
Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull iommu fixes from Will Deacon:
"Two straightforward vt-d fixes:
- Fix boot when intel iommu initialisation fails under TXT (tboot)
- Fix intel iommu compilation error when DMAR is enabled without ATS
and temporarily update IOMMU MAINTAINERs entry"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
MAINTAINERS: Temporarily add myself to the IOMMU entry
iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set
iommu/vt-d: Avoid panic if iommu init fails in tboot system
Linus Torvalds [Fri, 20 Nov 2020 18:16:26 +0000 (10:16 -0800)]
Merge tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:
- sdhci-of-arasan: Stabilize communication by fixing tap value configs
- sdhci-pci: Use SDR25 timing for HS mode for BYT-based Intel HWs"
* tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-arasan: Issue DLL reset explicitly
mmc: sdhci-of-arasan: Use Mask writes for Tap delays
mmc: sdhci-of-arasan: Allow configuring zero tap values
mmc: sdhci-pci: Prefer SDR25 timing for High Speed mode for BYT-based Intel controllers
Linus Torvalds [Fri, 20 Nov 2020 17:56:16 +0000 (09:56 -0800)]
Merge tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes: the only core change is a minor error
code handling in the control API, and all the rest are device-specific
fixes, mostly quirks, fixups and ASoC Intel fixes.
It looks boring, and good so"
* tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: mixart: Fix mutex deadlock
ALSA: hda/ca0132: Fix compile warning without PCI
ASOC: Intel: kbl_rt5663_rt5514_max98927: Do not try to disable disabled clock
ALSA: usb-audio: Add delay quirk for all Logitech USB devices
ASoC: Intel: catpt: Correct clock selection for dai trigger
ASoC: Intel: catpt: Skip position update for unprepared streams
ASoC: qcom: lpass-platform: Fix memory leak
ASoC: Intel: KMB: Fix S24_LE configuration
ALSA: hda: Add Alderlake-S PCI ID and HDMI codec vid
ALSA: usb-audio: Use ALC1220-VB-DT mapping for ASUS ROG Strix TRX40 mobo
ALSA: firewire: Clean up a locking issue in copy_resp_to_buf()
ASoC: rt1015: increase the time to detect BCLK
ALSA: ctl: fix error path at adding user-defined element set
ALSA: hda/realtek - HP Headset Mic can't detect after boot
ALSA: hda/realtek - Add supported mute Led for HP
ALSA: hda/realtek: Add some Clove SSID in the ALC293(ALC1220)
ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button
ASoC: rt1015: add delay to fix pop noise from speaker
Linus Torvalds [Fri, 20 Nov 2020 17:49:25 +0000 (09:49 -0800)]
Merge tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Weekly fixes pull.
This contains some fixes for sun4i/dw-hdmi probing, then amdgpu
enables arcturus hw without experimental flag and two other fixes and
a group of i915 fixes.
It also has a backported from next fix for the warn on reported in
ast/drm_gem_vram_helper code in the merge window. There's a separate
report which initially looked to be the same problem, but I'm going to
chase that up next week a bit more as I don't think the bisect landed
anywhere useful.
Summary:
core:
- vram helper TTM regression fix
amdgpu:
- Pageflip fix for navi1x with 5 or 6 displays
- Remove experimental flag for Arcturus
- Fix regression in atomic commit tail rework
* tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm:
drm/i915/gt: Fixup tgl mocs for PTE tracking
drm/vram-helper: Fix use of top-down placement
drm/i915/gt: Remember to free the virtual breadcrumbs
drm/i915: Handle max_bpc==16
drm/amd/display: Always get CRTC updated constant values inside commit tail
drm/sun4i: backend: Fix probe failure with multiple backends
drm/sun4i: dw-hdmi: fix error return code in sun8i_dw_hdmi_bind()
drm/i915/selftests: Fix wrong return value of perf_request_latency()
drm/i915/selftests: Fix wrong return value of perf_series_engines()
drm/i915: Avoid memory leak with more than 16 workarounds on a list
drm/i915/tgl: Fix Media power gate sequence.
drm/amdgpu: remove experimental flag from arcturus
drm/amd/display: Add missing pflip irq for dcn2.0
drm/i915/gvt: return error when failing to take the module reference
drm: bridge: dw-hdmi: Avoid resetting force in the detect function
drm/i915/gvt: Set ENHANCED_FRAME_CAP bit
drm/i915/gvt: Temporarily disable vfio_edid for BXT/APL
Dexuan Cui [Wed, 18 Nov 2020 00:03:05 +0000 (16:03 -0800)]
video: hyperv_fb: Fix the cache type when mapping the VRAM
x86 Hyper-V used to essentially always overwrite the effective cache type
of guest memory accesses to WB. This was problematic in cases where there
is a physical device assigned to the VM, since that often requires that
the VM should have control over cache types. Thus, on newer Hyper-V since
2018, Hyper-V always honors the VM's cache type, but unexpectedly Linux VM
users start to complain that Linux VM's VRAM becomes very slow, and it
turns out that Linux VM should not map the VRAM uncacheable by ioremap().
Fix this slowness issue by using ioremap_cache().
On ARM64, ioremap_cache() is also required as the host also maps the VRAM
cacheable, otherwise VM Connect can't display properly with ioremap() or
ioremap_wc().
With this change, the VRAM on new Hyper-V is as fast as regular RAM, so
it's no longer necessary to use the hacks we added to mitigate the
slowness, i.e. we no longer need to allocate physical memory and use
it to back up the VRAM in Generation-1 VM, and we also no longer need to
allocate physical memory to back up the framebuffer in a Generation-2 VM
and copy the framebuffer to the real VRAM. A further big change will
address these for v5.11.
Fixes: 68a2d20b79b1 ("drivers/video: add Hyper-V Synthetic Video Frame Buffer Driver") Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/20201118000305.24797-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
Jan Kara [Wed, 18 Nov 2020 15:30:32 +0000 (16:30 +0100)]
ext4: fix bogus warning in ext4_update_dx_flag()
The idea of the warning in ext4_update_dx_flag() is that we should warn
when we are clearing EXT4_INODE_INDEX on a filesystem with metadata
checksums enabled since after clearing the flag, checksums for internal
htree nodes will become invalid. So there's no need to warn (or actually
do anything) when EXT4_INODE_INDEX is not set.
Link: https://lore.kernel.org/r/20201118153032.17281-1-jack@suse.cz Fixes: 48a34311953d ("ext4: fix checksum errors with indexed dirs") Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
GVT Fixes: It temporarily disables VFIO edid
feature on BXT/APL until its virtual display is really fixed to make
it work properly. And fixes for DPCD 1.2 and error return in taking
module reference.
Your maintainer committed a major braino in the rmap code by adding the
attr fork, bmbt, and unwritten extent usage bits into rmap record key
comparisons. While XFS uses the usage bits *in the rmap records* for
cross-referencing metadata in xfs_scrub and xfs_repair, it only needs
the owner and offset information to distinguish between reverse mappings
of the same physical extent into the data fork of a file at multiple
offsets. The other bits are not important for key comparisons for index
lookups, and never have been.
Eric Sandeen reports that this causes regressions in generic/299, so
undo this patch before it does more damage.
Reported-by: Eric Sandeen <sandeen@sandeen.net> Fixes: 6ff646b2ceb0 ("xfs: fix rmap key and record comparison functions") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Linus Torvalds [Thu, 19 Nov 2020 21:33:16 +0000 (13:33 -0800)]
Merge tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc5, including fixes from the WiFi
(mac80211), can and bpf (including the strncpy_from_user fix).
Current release - regressions:
- mac80211: fix memory leak of filtered powersave frames
- mac80211: free sta in sta_info_insert_finish() on errors to avoid
sleeping in atomic context
- netlabel: fix an uninitialized variable warning added in -rc4
Previous release - regressions:
- vsock: forward all packets to the host when no H2G is registered,
un-breaking AWS Nitro Enclaves
- net: Exempt multicast addresses from five-second neighbor lifetime
requirement, decreasing the chances neighbor tables fill up
- net/tls: fix corrupted data in recvmsg
- qed: fix ILT configuration of SRC block
- can: m_can: process interrupt only when not runtime suspended
Previous release - always broken:
- page_frag: Recover from memory pressure by not recycling pages
allocating from the reserves
- strncpy_from_user: Mask out bytes after NUL terminator
- ip_tunnels: Set tunnel option flag only when tunnel metadata is
present, always setting it confuses Open vSwitch
- bpf, sockmap:
- Fix partial copy_page_to_iter so progress can still be made
- Fix socket memory accounting and obeying SO_RCVBUF
- net: Have netpoll bring-up DSA management interface
- net: bridge: add missing counters to ndo_get_stats64 callback
- tcp: brr: only postpone PROBE_RTT if RTT is < current min_rtt
- enetc: Workaround MDIO register access HW bug
- net/ncsi: move netlink family registration to a subsystem init,
instead of tying it to driver probe
- net: ftgmac100: unregister NC-SI when removing driver to avoid
crash
- lan743x:
- prevent interrupt storm on open
- fix freeing skbs in the wrong context
- net/mlx5e: Fix socket refcount leak on kTLS RX resync
- net: dsa: mv88e6xxx: Avoid VLAN database corruption on 6097
- fix 21 unset return codes and other mistakes on error paths, mostly
detected by the Hulk Robot"
* tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (115 commits)
fail_function: Remove a redundant mutex unlock
selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
net/smc: fix direct access to ib_gid_addr->ndev in smc_ib_determine_gid()
net/smc: fix matching of existing link groups
ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module
libbpf: Fix VERSIONED_SYM_COUNT number parsing
net/mlx4_core: Fix init_hca fields offset
atm: nicstar: Unmap DMA on send error
page_frag: Recover from memory pressure
net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset
mlxsw: core: Use variable timeout for EMAD retries
mlxsw: Fix firmware flashing
net: Have netpoll bring-up DSA management interface
atl1e: fix error return code in atl1e_probe()
atl1c: fix error return code in atl1c_probe()
ah6: fix error return code in ah6_input()
net: usb: qmi_wwan: Set DTR quirk for MR400
can: m_can: process interrupt only when not runtime suspended
can: flexcan: flexcan_chip_start(): fix erroneous flexcan_transceiver_enable() during bus-off recovery
...
Linus Torvalds [Thu, 19 Nov 2020 21:01:53 +0000 (13:01 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"The last two weeks have been quiet here, just the usual smattering of
long standing bug fixes.
A collection of error case bug fixes:
- Improper nesting of spinlock types in cm
- Missing error codes and kfree()
- Ensure dma_virt_ops users have the right kconfig symbols to work
properly
- Compilation failure of tools/testing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
tools/testing/scatterlist: Fix test to compile and run
IB/hfi1: Fix error return code in hfi1_init_dd()
RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs
RDMA/pvrdma: Fix missing kfree() in pvrdma_register_device()
RDMA/cm: Make the local_id_table xarray non-irq
Theodore Ts'o [Thu, 19 Nov 2020 20:36:25 +0000 (15:36 -0500)]
ext4: drop fast_commit from /proc/mounts
The options in /proc/mounts must be valid mount options --- and
fast_commit is not a mount option. Otherwise, command sequences like
this will fail:
# mount /dev/vdc /vdc
# mkdir -p /vdc/phoronix_test_suite /pts
# mount --bind /vdc/phoronix_test_suite /pts
# mount -o remount,nodioread_nolock /pts
mount: /pts: mount point not mounted or bad option.
And in the system logs, you'll find:
EXT4-fs (vdc): Unrecognized mount option "fast_commit" or missing value
Fixes: 995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
====================
1) libbpf should not attempt to load unused subprogs, from Andrii.
2) Make strncpy_from_user() mask out bytes after NUL terminator, from Daniel.
3) Relax return code check for subprograms in the BPF verifier, from Dmitrii.
4) Fix several sockmap issues, from John.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
fail_function: Remove a redundant mutex unlock
selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
libbpf: Fix VERSIONED_SYM_COUNT number parsing
bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list
bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self
bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self
bpf, sockmap: Use truesize with sk_rmem_schedule()
bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect
bpf, sockmap: Fix partial copy_page_to_iter so progress can still be made
selftests/bpf: Fix error return code in run_getsockopt_test()
bpf: Relax return code check for subprograms
tools, bpftool: Add missing close before bpftool net attach exit
MAINTAINERS/bpf: Update Andrii's entry.
selftests/bpf: Fix unused attribute usage in subprogs_unused test
bpf: Fix unsigned 'datasec_id' compared with zero in check_pseudo_btf_id
bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepare
libbpf: Don't attempt to load unused subprog as an entry-point BPF program
====================
Chris Wilson [Thu, 15 Oct 2020 12:21:38 +0000 (13:21 +0100)]
drm/i915/gt: Fixup tgl mocs for PTE tracking
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
caused display glitches. Though it is documented as deprecated (and so
likely behaves as uncached) use the follow-pte bit and force it out of
L3 cache.
Testcase: igt/kms_frontbuffer_tracking
Testcase: igt/kms_big_fb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
(cherry picked from commit a04ac827366594c7244f60e9be79fcb404af69f0) Fixes: 849c0fe9e831 ("drm/i915/gt: Initialize reserved and unspecified MOCS indices") Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Updated Fixes tag]
6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
kernel}_str helpers") introduced a subtle bug where
bpf_probe_read_user_str() would potentially copy a few extra bytes after
the NUL terminator.
This issue is particularly nefarious when strings are used as map keys,
as seemingly identical strings can occupy multiple entries in a map.
This patchset fixes the issue and introduces a selftest to prevent
future regressions.
v6 -> v7:
* Add comments
v5 -> v6:
* zero-pad up to sizeof(unsigned long) after NUL
Daniel Xu [Tue, 17 Nov 2020 20:05:46 +0000 (12:05 -0800)]
selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
Previously, bpf_probe_read_user_str() could potentially overcopy the
trailing bytes after the NUL due to how do_strncpy_from_user() does the
copy in long-sized strides. The issue has been fixed in the previous
commit.
This commit adds a selftest that ensures we don't regress
bpf_probe_read_user_str() again.
Daniel Xu [Tue, 17 Nov 2020 20:05:45 +0000 (12:05 -0800)]
lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
do_strncpy_from_user() may copy some extra bytes after the NUL
terminator into the destination buffer. This usually does not matter for
normal string operations. However, when BPF programs key BPF maps with
strings, this matters a lot.
A BPF program may read strings from user memory by calling the
bpf_probe_read_user_str() helper which eventually calls
do_strncpy_from_user(). The program can then key a map with the
destination buffer. BPF map keys are fixed-width and string-agnostic,
meaning that map keys are treated as a set of bytes.
The issue is when do_strncpy_from_user() overcopies bytes after the NUL
terminator, it can result in seemingly identical strings occupying
multiple slots in a BPF map. This behavior is subtle and totally
unexpected by the user.
This commit masks out the bytes following the NUL while preserving
long-sized stride in the fast path.
Linus Torvalds [Thu, 19 Nov 2020 19:32:31 +0000 (11:32 -0800)]
Merge tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes for CVE-2020-4788.
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.
This patch series flushes the L1 cache on kernel entry (patch 2) and
after the kernel performs any user accesses (patch 3). It also adds a
self-test and performs some related cleanups"
* tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations
selftests/powerpc: refactor entry and rfi_flush tests
selftests/powerpc: entry flush test
powerpc: Only include kup-radix.h for 64-bit Book3S
powerpc/64s: flush L1D after user accesses
powerpc/64s: flush L1D on kernel entry
selftests/powerpc: rfi_flush: disable entry flush if present
Linus Torvalds [Thu, 19 Nov 2020 19:22:33 +0000 (11:22 -0800)]
Merge tag 'xtensa-20201119' of git://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:
- fix placement of cache alias remapping area
- disable preemption around cache alias management calls
- add missing __user annotation to strncpy_from_user argument
* tag 'xtensa-20201119' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: uaccess: Add missing __user to strncpy_from_user() prototype
xtensa: disable preemption around cache alias management calls
xtensa: fix TLBTEMP area placement
Thomas Zimmermann [Mon, 21 Sep 2020 14:25:36 +0000 (16:25 +0200)]
drm/vram-helper: Fix use of top-down placement
Commit 7053e0eab473 ("drm/vram-helper: stop using TTM placement flags")
cleared the BO placement flags if top-down placement had been selected.
Hence, BOs that were supposed to go into VRAM are now placed in a default
location in system memory.
Trying to scanout the incorrectly pinned BO results in displayed garbage
and an error message.
Fix the bug by keeping the placement flags. The top-down placement flag
is stored in a separate variable.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 7053e0eab473 ("drm/vram-helper: stop using TTM placement flags") Reported-by: Pu Wen <puwen@hygon.cn> [for 5.10-rc1] Tested-by: Pu Wen <puwen@hygon.cn> Cc: Christian König <christian.koenig@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20200921142536.4392-1-tzimmermann@suse.de
(cherry picked from commit b8f8dbf6495850b0babc551377bde754b7bc0eea)
[pulled into fixes from drm-next] Signed-off-by: Dave Airlie <airlied@redhat.com>