Linus Torvalds [Fri, 28 Jan 2022 17:00:26 +0000 (19:00 +0200)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Two larger x86 series:
- Redo incorrect fix for SEV/SMAP erratum
- Windows 11 Hyper-V workaround
Other x86 changes:
- Various x86 cleanups
- Re-enable access_tracking_perf_test
- Fix for #GP handling on SVM
- Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID
- Fix for ICEBP in interrupt shadow
- Avoid false-positive RCU splat
- Enable Enlightened MSR-Bitmap support for real
ARM:
- Correctly update the shadow register on exception injection when
running in nVHE mode
- Correctly use the mm_ops indirection when performing cache
invalidation from the page-table walker
- Restrict the vgic-v3 workaround for SEIS to the two known broken
implementations
Generic code changes:
- Dead code cleanup"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
KVM: eventfd: Fix false positive RCU usage warning
KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
KVM: nVMX: Rename vmcs_to_field_offset{,_table}
KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP
KVM: x86: add system attribute to retrieve full set of supported xsave states
KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr
selftests: kvm: move vm_xsave_req_perm call to amx_test
KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time
KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
KVM: x86: Keep MSR_IA32_XSS unchanged for INIT
KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2}
KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02
KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest
KVM: x86: Check .flags in kvm_cpuid_check_equal() too
KVM: x86: Forcibly leave nested virt when SMM state is toggled
KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments()
KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real
...
Linus Torvalds [Fri, 28 Jan 2022 16:50:05 +0000 (18:50 +0200)]
Merge tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix loading of modules with lots of relocations and add a regression
test for it.
- Fix machine check handling for vector validity and guarded storage
validity failures in KVM guests.
- Fix hypervisor performance data to include z/VM guests with access
control group set.
- Fix z900 build problem in uaccess code.
- Update defconfigs.
* tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/hypfs: include z/VM guests with access control group set
s390: update defconfigs
s390/module: test loading modules with a lot of relocations
s390/module: fix loading modules with a lot of relocations
s390/uaccess: fix compile error
s390/nmi: handle vector validity failures for KVM guests
s390/nmi: handle guarded storage validity failures for KVM guests
Fixes: 7345201c3963 ("IB/cm: Improve the calling of cm_init_av_for_lap and cm_init_av_by_path") Link: https://lore.kernel.org/r/7615f23bbb5c5b66d03f6fa13e1c99d51dae6916.1642581448.git.leonro@nvidia.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linus Torvalds [Fri, 28 Jan 2022 16:36:42 +0000 (18:36 +0200)]
Merge tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A ZERO_SIZE_PTR dereference fix from Xiubo and two fixes for async
creates interacting with pool namespace-constrained OSD permissions
from Jeff (marked for stable)"
* tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client:
ceph: set pool_ns in new inode layout for async creates
ceph: properly put ceph_string reference after async create attempt
ceph: put the requests/sessions when it fails to alloc memory
David Howells [Thu, 27 Jan 2022 16:02:34 +0000 (16:02 +0000)]
Fix a warning about a malformed kernel doc comment in cifs
Fix by removing the extra asterisk.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Fri, 28 Jan 2022 08:00:29 +0000 (10:00 +0200)]
ocfs2: fix subdirectory registration with register_sysctl()
The kernel test robot reports that commit c42ff46f97c1 ("ocfs2: simplify
subdirectory registration with register_sysctl()") is broken, and
results in kernel warning messages like
sysctl table check failed: fs/ocfs2/nm Not a file
sysctl table check failed: fs/ocfs2/nm No proc_handler
sysctl table check failed: fs/ocfs2/nm bogus .mode 0555
and in fact this was already reported back in linux-next, but nobody
seems to have reacted to that report. Possibly that original report
only ever made it to the lkp list.
The problem seems to be that the simplification didn't actually go far
enough, and should have converted the whole directory path to the final
sysctl file, rather than just the two first components.
Catalin Marinas [Fri, 28 Jan 2022 16:14:06 +0000 (16:14 +0000)]
Merge tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into for-next/fixes
coresight: trbe: Workaround Cortex-A510 erratas
This pull request is providing arm64 definitions to support
TRBE Cortex-A510 erratas.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
* tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux:
arm64: errata: Add detection for TRBE trace data corruption
arm64: errata: Add detection for TRBE invalid prohibited states
arm64: errata: Add detection for TRBE ignored system register writes
arm64: Add Cortex-A510 CPU part definition
Peter Ujfalusi [Wed, 26 Jan 2022 10:03:25 +0000 (12:03 +0200)]
ASoC: rt5682: Fix deadlock on resume
On resume from suspend the following chain of events can happen:
A rt5682_resume() -> mod_delayed_work() for jack_detect_work
B DAPM sequence starts ( DAPM is locked now)
A1. rt5682_jack_detect_handler() scheduled
- Takes both jdet_mutex and calibrate_mutex
- Calls in to rt5682_headset_detect() which tries to take DAPM lock, it
starts to wait for it as B path took it already.
B1. DAPM sequence reaches the "HP Amp", rt5682_hp_event() tries to take
the jdet_mutex, but it is locked in A1, so it waits.
Deadlock.
To solve the deadlock, drop the jdet_mutex, use the jack_detect_work to do
the jack removal handling, move the dapm lock up one level to protect the
most of the rt5682_jack_detect_handler(), but not the jack reporting as it
might trigger a DAPM sequence.
The rt5682_headset_detect() can be changed to static as well.
Dmitry Osipenko [Wed, 12 Jan 2022 19:50:39 +0000 (22:50 +0300)]
ASoC: hdmi-codec: Fix OOB memory accesses
Correct size of iec_status array by changing it to the size of status
array of the struct snd_aes_iec958. This fixes out-of-bounds slab
read accesses made by memcpy() of the hdmi-codec driver. This problem
is reported by KASAN.
Takashi Iwai [Wed, 19 Jan 2022 15:52:49 +0000 (16:52 +0100)]
ASoC: soc-pcm: Move debugfs removal out of spinlock
The recent fix for DPCM locking also covered the loop in
dpcm_be_disconnect() with the FE stream lock. This caused an
unexpected side effect, thought: calling debugfs_remove_recursive() in
the spinlock may lead to lockdep splats as the code there assumes the
SOFTIRQ-safe context.
For avoiding the problem, this patch changes the disconnection
procedure to two phases: at first, the matching entries are removed
from the linked list, then the resources are freed outside the lock.
Fixes: b7898396f4bb ("ASoC: soc-pcm: Fix and cleanup DPCM locking") Reported-and-tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20220119155249.26754-3-tiwai@suse.de Signed-off-by: Mark Brown <broonie@kernel.org>
Takashi Iwai [Wed, 19 Jan 2022 15:52:48 +0000 (16:52 +0100)]
ASoC: soc-pcm: Fix DPCM lockdep warning due to nested stream locks
The recent change for DPCM locking caused spurious lockdep warnings.
Actually the warnings are false-positive, as those are triggered due
to the nested stream locks for FE and BE. Since both locks belong to
the same lock class, lockdep sees it as if a deadlock.
For fixing this, we need to take PCM stream locks for BE with the
nested lock primitives. Since currently snd_pcm_stream_lock*() helper
assumes only the top-level single locking, a new helper function
snd_pcm_stream_lock_irqsave_nested() is defined for a single-depth
nested lock, which is now used in the BE DAI trigger that is always
performed inside a FE stream lock.
Linus Torvalds [Fri, 28 Jan 2022 15:51:31 +0000 (17:51 +0200)]
Merge tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:
"Fixes for userspace breakage caused by fsnotify changes ~3 years ago
and one fanotify cleanup"
* tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix fsnotify hooks in pseudo filesystems
fsnotify: invalidate dcache before IN_DELETE event
fanotify: remove variable set but not used
Leon Romanovsky [Tue, 18 Jan 2022 07:35:01 +0000 (09:35 +0200)]
RDMA/ucma: Protect mc during concurrent multicast leaves
Partially revert the commit mentioned in the Fixes line to make sure that
allocation and erasing multicast struct are locked.
BUG: KASAN: use-after-free in ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline]
BUG: KASAN: use-after-free in ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579
Read of size 8 at addr ffff88801bb74b00 by task syz-executor.1/25529
CPU: 0 PID: 25529 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline]
ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579
ucma_destroy_id+0x1e6/0x280 drivers/infiniband/core/ucma.c:614
ucma_write+0x25c/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xae0 fs/read_write.c:588
ksys_write+0x1ee/0x250 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Currently the xarray search can touch a concurrently freeing mc as the
xa_for_each() is not surrounded by any lock. Rather than hold the lock for
a full scan hold it only for the effected items, which is usually an empty
list.
Fixes: 95fe51096b7a ("RDMA/ucma: Remove mc_list and rely on xarray") Link: https://lore.kernel.org/r/1cda5fabb1081e8d16e39a48d3a4f8160cea88b8.1642491047.git.leonro@nvidia.com Reported-by: syzbot+e3f96c43d19782dd14a7@syzkaller.appspotmail.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linus Torvalds [Fri, 28 Jan 2022 15:19:49 +0000 (17:19 +0200)]
Merge tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf and quota fixes from Jan Kara:
"Fixes for crashes in UDF when inode expansion fails and one quota
cleanup"
* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: cleanup double word in comment
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
Jisheng Zhang [Fri, 28 Jan 2022 14:15:50 +0000 (22:15 +0800)]
net: stmmac: properly handle with runtime pm in stmmac_dvr_remove()
There are two issues with runtime pm handling in stmmac_dvr_remove():
1. the mac is runtime suspended before stopping dma and rx/tx. We
need to ensure the device is properly resumed back.
2. the stmmaceth clk enable/disable isn't balanced in both exit and
error handling code path. Take the exit code path for example, when we
unbind the driver or rmmod the driver module, the mac is runtime
suspended as said above, so the stmmaceth clk is disabled, but
stmmac_dvr_remove()
stmmac_remove_config_dt()
clk_disable_unprepare()
CCF will complain this time. The error handling code path suffers
from the similar situtaion.
Here are kernel warnings in error handling code path on Allwinner D1
platform:
Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The panic happens in hfi1_ipoib_txreq_deinit() because there is a NULL
deref when hfi1_ipoib_netdev_dtor() is called in this error case.
hfi1_ipoib_txreq_init() and hfi1_ipoib_rxq_init() are self unwinding so
fix by adjusting the error paths accordingly.
Other changes:
- hfi1_ipoib_free_rdma_netdev() is deleted including the free_netdev()
since the netdev core code deletes calls free_netdev()
- The switch to the accelerated entrances is moved to the success path.
Cc: stable@vger.kernel.org Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/1642287756-182313-4-git-send-email-mike.marciniszyn@cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For ipoib, the txqueuelen is modified with the module parameter
send_queue_size.
Fix by changing to use kv versions of the same allocator to handle the
large allocations. The allocation embeds a hdr struct that is dma mapped.
Change that struct to a pointer to a kzalloced struct.
Cc: stable@vger.kernel.org Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/1642287756-182313-3-git-send-email-mike.marciniszyn@cornelisnetworks.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
David S. Miller [Fri, 28 Jan 2022 15:10:45 +0000 (15:10 +0000)]
Merge tag 'ieee802154-for-net-2022-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2022-01-28
An update from ieee802154 for your *net* tree.
A bunch of fixes in drivers, all from Miquel Raynal.
Clarifying the default channel in hwsim, leak fixes in at86rf230 and ca8210 as
well as a symbol duration fix for mcr20a. Topping up the driver fixes with
better error codes in nl802154 and a cleanup in MAINTAINERS for an orphaned
driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyue Wang [Fri, 28 Jan 2022 10:47:14 +0000 (18:47 +0800)]
gve: fix the wrong AdminQ buffer queue index check
The 'tail' and 'head' are 'unsigned int' type free-running count, when
'head' is overflow, the 'int i (= tail) < u32 head' will be false:
Only '- loop 0: idx = 63' result is shown, so it needs to use 'int' type
to compare, it can handle the overflow correctly.
typedef uint32_t u32;
int main()
{
u32 tail, head;
int stail, shead;
int i, loop;
tail = 0xffffffff;
head = 0x00000000;
for (i = tail, loop = 0; i < head; i++) {
unsigned int idx = i & 63;
printf("+ loop %d: idx = %u\n", loop++, idx);
}
stail = tail;
shead = head;
for (i = stail, loop = 0; i < shead; i++) {
unsigned int idx = i & 63;
printf("- loop %d: idx = %u\n", loop++, idx);
}
return 0;
}
Fixes: 5cdad90de62c ("gve: Batch AQ commands for creating and destroying queues.") Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Duoming Zhou [Fri, 28 Jan 2022 04:47:16 +0000 (12:47 +0800)]
ax25: add refcount in ax25_dev to avoid UAF bugs
If we dereference ax25_dev after we call kfree(ax25_dev) in
ax25_dev_device_down(), it will lead to concurrency UAF bugs.
There are eight syscall functions suffer from UAF bugs, include
ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(),
ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and
ax25_info_show().
The root cause of UAF bugs is that kfree(ax25_dev) in
ax25_dev_device_down() is not protected by any locks.
When ax25_dev, which there are still pointers point to,
is released, the concurrency UAF bug will happen.
This patch introduces refcount into ax25_dev in order to
guarantee that there are no pointers point to it when ax25_dev
is released.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Duoming Zhou [Fri, 28 Jan 2022 04:47:15 +0000 (12:47 +0800)]
ax25: improve the incomplete fix to avoid UAF and NPD bugs
The previous commit 1ade48d0c27d ("ax25: NPD bug when detaching
AX25 device") introduce lock_sock() into ax25_kill_by_device to
prevent NPD bug. But the concurrency NPD or UAF bug will occur,
when lock_sock() or release_sock() dereferences the ax25_cb->sock.
The NULL pointer dereference bug can be shown as below:
The root cause is that the sock is set to null before dereference
site (1) or (2). Therefore, this patch extracts the ax25_cb->sock
in advance, and uses ax25_list_lock to protect it, which can synchronize
with ax25_cb_del() and ensure the value of sock is not null before
dereference sites.
The root cause is that the sock is released before dereference
site (1) or (2). Therefore, this patch uses sock_hold() to increase
the refcount of sock and uses ax25_list_lock to protect it, which
can synchronize with ax25_cb_del() in ax25_destroy_socket() and
ensure the sock wil not be released before dereference sites.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the pin names from "MIO%d" to "MIO-%d", but all dts
in arch/arm64/boot/dts/xilinx still use the old name. As a result my
ZCU104 has no output on serial terminal and is not reachable over
network.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Yuji Ishikawa [Thu, 27 Jan 2022 12:17:14 +0000 (21:17 +0900)]
net: stmmac: dwmac-visconti: No change to ETHER_CLOCK_SEL for unexpected speed request.
Variable clk_sel_val is not initialized in the default case of the first switch statement.
In that case, the function should return immediately without any changes to the hardware.
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver") Signed-off-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp> Reviewed-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
o The server has not recorded an unconfirmed { v, x, c, *, * } and
has recorded a confirmed { v, x, c, *, s }. If the principals of
the record and of SETCLIENTID_CONFIRM do not match, the server
returns NFS4ERR_CLID_INUSE without removing any relevant leased
client state, and without changing recorded callback and
callback_ident values for client { x }.
The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.
Fixes: 2b63482185e6 ("nfsd: fix clid_inuse on mount with security change") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Bruce Fields <bfields@fieldses.org>
Rob Herring [Wed, 26 Jan 2022 23:13:26 +0000 (17:13 -0600)]
spi: dt-bindings: Fix 'reg' child node schema
The schema for SPI child nodes' 'reg' property is not complete. 'reg' is
a matrix of cells. The schema needs to define both the number of 'reg'
entries and constraints on each entry.
Kamal Dasu [Thu, 27 Jan 2022 18:53:59 +0000 (13:53 -0500)]
spi: bcm-qspi: check for valid cs before applying chip select
Apply only valid chip select value. This change fixes case where chip
select is set to initial value of '-1' during probe and PM supend and
subsequent resume can try to use the value with undefined behaviour.
Also in case where gpio based chip select, the check in
bcm_qspi_chip_select() shall prevent undefined behaviour on resume.
Fixes: fa236a7ef240 ("spi: bcm-qspi: Add Broadcom MSPI driver") Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220127185359.27322-1-kdasu.kdev@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu),
kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact,
kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
a false positive warning. So use hlist_for_each_entry_srcu() instead of
hlist_for_each_entry_rcu().
Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:34 +0000 (18:01 +0100)]
KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when
Enlightened VMCS interface is in use:
"Any VMREAD or VMWRITE instructions while an enlightened VMCS is
active is unsupported and can result in unexpected behavior.""
Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field
0x4404 ("VM-exit interruption information") are observed. Failing
these attempts with nested_vmx_failInvalid() makes such guests
unbootable.
Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed
eventually but for the time being we need a workaround. (Temporary) allow
VMREAD to get data from the currently loaded Enlightened VMCS.
Note: VMWRITE instructions remain forbidden, it is not clear how to
handle them properly and hopefully won't ever be needed.
Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:33 +0000 (18:01 +0100)]
KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
In preparation to allowing reads from Enlightened VMCS from
handle_vmread(), implement evmcs_field_offset() to get the correct
read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V,
is almost what's needed but a few things need to be adjusted. First,
WARN_ON() is unacceptable for handle_vmread() as any field can (in
theory) be supplied by the guest and not all fields are defined in
eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields).
It also sounds like a good idea to WARN_ON() if such fields are ever
accessed by KVM-on-Hyper-V.
Implement dedicated evmcs_field_offset() helper.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:32 +0000 (18:01 +0100)]
KVM: nVMX: Rename vmcs_to_field_offset{,_table}
vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque
blob which is not supposed to be accessed directly. In fact,
vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.
Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity.
No functional change intended.
Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:31 +0000 (18:01 +0100)]
KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field,
PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes
sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too.
Note, none of the currently existing Windows/Hyper-V versions are known
to enable 'save VMX-preemption timer value' when eVMCS is in use, the
change is aimed at making the filtering future proof.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Wed, 12 Jan 2022 17:01:30 +0000 (18:01 +0100)]
KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS,
MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair,
MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way
MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely
on 'true' MSR data.
Note, none of the currently existing Windows/Hyper-V versions are known
to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change
is aimed at making the filtering future proof.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 26 Jan 2022 12:49:45 +0000 (07:49 -0500)]
KVM: x86: add system attribute to retrieve full set of supported xsave states
Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
have not been enabled. Probing those, for example so that they can be
passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
The latter is in fact worse, even though that is what the rest of the
API uses, because it would require supported_xcr0 to be moved from the
KVM module to the kernel just for this use. In addition, the value
would be nonsensical (or an error would have to be returned) until
the KVM module is loaded in.
Therefore, to limit the growth of system ioctls, add a /dev/kvm
variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Thu, 27 Jan 2022 15:31:53 +0000 (07:31 -0800)]
KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr
Add a helper to handle converting the u64 userspace address embedded in
struct kvm_device_attr into a userspace pointer, it's all too easy to
forget the intermediate "unsigned long" cast as well as the truncation
check.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Roger Pau Monne [Fri, 21 Jan 2022 09:01:46 +0000 (10:01 +0100)]
xen/x2apic: enable x2apic mode when supported for HVM
There's no point in disabling x2APIC mode when running as a Xen HVM
guest, just enable it when available.
Remove some unneeded wrapping around the detection functions, and
simply provide a xen_x2apic_available helper that's a wrapper around
x2apic_supported.
Mark Brown [Mon, 24 Jan 2022 17:55:27 +0000 (17:55 +0000)]
kselftest/arm64: Correct logging of FPSIMD register read via ptrace
There's a cut'n'paste error in the logging for our test for reading register
state back via ptrace, correctly say that we did a read instead of a write.
Mark Brown [Mon, 24 Jan 2022 17:55:26 +0000 (17:55 +0000)]
kselftest/arm64: Skip VL_INHERIT tests for unsupported vector types
Currently we unconditionally test the ability to set the vector length
inheritance flag via ptrace meaning that we generate false failures on
systems that don't support SVE when we attempt to set the vector length
there. Check the hwcap and mark the tests as skipped when it's not present.
Fixes: 0ba1ce1e8605 ("selftests: arm64: Add coverage of ptrace flags for SVE VL inheritance") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220124175527.3260234-2-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Fri, 28 Jan 2022 09:47:05 +0000 (11:47 +0200)]
Merge tag 'ata-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix for 5.17-rc2, adding a missing resource allocation error
check in the pata_platform driver, from Zhou"
* tag 'ata-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_platform: Fix a NULL pointer dereference in __pata_platform_probe()
Takashi Iwai [Thu, 27 Jan 2022 13:57:17 +0000 (14:57 +0100)]
ALSA: hda: Fix signedness of sscanf() arguments
The %x format of sscanf() takes an unsigned int pointer, while we pass
a signed int pointer. Practically it's OK, but this may result in a
compile warning. Let's fix it.
Linus Torvalds [Fri, 28 Jan 2022 07:48:20 +0000 (09:48 +0200)]
Merge tag 'hwmon-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix crash in nct6775 driver
- Prevent divide by zero in adt7470 driver
- Fix conditional compile warning in pmbus/ir38064 driver
- Various minor fixes in lm90 driver
* tag 'hwmon-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775) Fix crash in clear_caseopen
hwmon: (adt7470) Prevent divide by zero in adt7470_fan_write()
hwmon: (pmbus/ir38064) Mark ir38064_of_match as __maybe_unused
hwmon: (lm90) Fix sysfs and udev notifications
hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
hwmon: (lm90) Mark alert as broken for MAX6680
hwmon: (lm90) Mark alert as broken for MAX6654
hwmon: (lm90) Re-enable interrupts after alert clears
hwmon: (lm90) Reduce maximum conversion rate for G781
* tag 'drm-fixes-2022-01-28' of git://anongit.freedesktop.org/drm/drm: (25 commits)
drm/privacy-screen: honor acpi=off in detect_thinkpad_privacy_screen
Revert "drm/ast: Support 1600x900 with 108MHz PCLK"
drm/amdgpu/display: Remove t_srx_delay_us.
drm/amd/display: Wrap dcn301_calculate_wm_and_dlg for FPU.
drm/amd/display: Fix FP start/end for dcn30_internal_validate_bw.
drm/amd/display/dc/calcs/dce_calcs: Fix a memleak in calculate_bandwidth()
drm/amdgpu/display: use msleep rather than udelay for long delays
drm/amdgpu/display: adjust msleep limit in dp_wait_for_training_aux_rd_interval
drm/amdgpu: filter out radeon secondary ids as well
drm/amd/display: change FIFO reset condition to embedded display only
drm/amd/display: Correct MPC split policy for DCN301
drm/amd/display: Fix for otg synchronization logic
drm/etnaviv: relax submit size limits
drm/msm/gpu: Cancel idle/boost work on suspend
drm/msm/gpu: Wait for idle before suspending
drm/atomic: Add the crtc to affected crtc only if uapi.enable = true
drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
drm/msm/a6xx: Add missing suspend_count increment
drm/msm: Fix wrong size calculation
drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
...
Fabio Estevam [Mon, 27 Dec 2021 16:14:02 +0000 (13:14 -0300)]
ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group
Currently, SD card fails to mount due to the following pinctrl error:
[ 11.170000] imx23-pinctrl 80018000.pinctrl: pin SSP1_DETECT already requested by 80018000.pinctrl; cannot claim for 80010000.spi
[ 11.180000] imx23-pinctrl 80018000.pinctrl: pin-65 (80010000.spi) status -22
[ 11.190000] imx23-pinctrl 80018000.pinctrl: could not request pin 65 (SSP1_DETECT) from group mmc0-pins-fixup.0 on device 80018000.pinctrl
[ 11.200000] mxs-mmc 80010000.spi: Error applying setting, reverse things back
Fix it by removing the MX23_PAD_SSP1_DETECT pin from the hog group as it
is already been used by the mmc0-pins-fixup pinctrl group.
With this change the rootfs can be mounted and the imx23-evk board can
boot successfully.
1) Remove leftovers from flowtable modules, from Geert Uytterhoeven.
2) Missing refcount increment of conntrack template in nft_ct,
from Florian Westphal.
3) Reduce nft_zone selftest time, also from Florian.
4) Add selftest to cover stateless NAT on fragments, from Florian Westphal.
5) Do not set net_device when for reject packets from the bridge path,
from Phil Sutter.
6) Cancel register tracking info on nft_byteorder operations.
7) Extend nft_concat_range selftest to cover set reload with no elements,
from Florian Westphal.
8) Remove useless update of pointer in chain blob builder, reported
by kbuild test robot.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nf_tables: remove assignment with no effect in chain blob builder
selftests: nft_concat_range: add test for reload with no element add/del
netfilter: nft_byteorder: track register operations
netfilter: nft_reject_bridge: Fix for missing reply from prerouting
selftests: netfilter: check stateless nat udp checksum fixup
selftests: netfilter: reduce zone stress test running time
netfilter: nft_ct: fix use after free when attaching zone template
netfilter: Remove flowtable relics
====================
Shyam Sundar S K [Thu, 27 Jan 2022 09:20:03 +0000 (14:50 +0530)]
net: amd-xgbe: Fix skb data length underflow
There will be BUG_ON() triggered in include/linux/skbuff.h leading to
intermittent kernel panic, when the skb length underflow is detected.
Fix this by dropping the packet if such length underflows are seen
because of inconsistencies in the hardware descriptors.
Fixes: 622c36f143fc ("amd-xgbe: Fix jumbo MTU processing on newer hardware") Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20220127092003.2812745-1-Shyam-sundar.S-k@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tom Zanussi [Thu, 27 Jan 2022 21:44:16 +0000 (15:44 -0600)]
tracing: Fix smatch warning for do while check in event_hist_trigger_parse()
The patch ec5ce0987541: "tracing: Allow whitespace to surround hist
trigger filter" from Jan 15, 2018, leads to the following Smatch
static checker warning:
kernel/trace/trace_events_hist.c:6199 event_hist_trigger_parse()
warn: 'p' can't be NULL.
Since p is always checked for a NULL value at the top of loop and
nothing in the rest of the loop will set it to NULL, the warning
is correct and might as well be 1 to silence the warning.
Tom Zanussi [Thu, 27 Jan 2022 21:44:15 +0000 (15:44 -0600)]
tracing: Fix smatch warning for null glob in event_hist_trigger_parse()
The recent rename of event_hist_trigger_parse() caused smatch
re-evaluation of trace_events_hist.c and as a result an old warning
was found:
kernel/trace/trace_events_hist.c:6174 event_hist_trigger_parse()
error: we previously assumed 'glob' could be null (see line 6166)
glob should never be null (and apparently smatch can also figure that
out and skip the warning when using the cross-function DB (but which
can't be used with a 0day build as it takes too much time to
generate)).
Nonetheless for clarity, remove the test but add a WARN_ON() in case
the code ever changes.
Link: https://lkml.kernel.org/r/96925e5c1f116654ada7ea0613d930b1266b5e1c.1643319703.git.zanussi@kernel.org Fixes: f404da6e1d46c ("tracing: Add 'last error' error facility for hist triggers") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Shuah Khan [Wed, 26 Jan 2022 00:13:01 +0000 (17:13 -0700)]
rtla: Make doc build optional
rtla build fails due to doc build dependency on rst2man. Make
doc build optional so rtla could be built without docs. Leave
the install dependency on doc_install alone.
Kees Cook [Tue, 25 Jan 2022 22:00:37 +0000 (14:00 -0800)]
tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro
As done for trace_events.h, also fix the __rel_loc macro in perf.h,
which silences the -Warray-bounds warning:
In file included from ./include/linux/string.h:253,
from ./include/linux/bitmap.h:11,
from ./include/linux/cpumask.h:12,
from ./include/linux/mm_types_task.h:14,
from ./include/linux/mm_types.h:5,
from ./include/linux/buildid.h:5,
from ./include/linux/module.h:14,
from samples/trace_events/trace-events-sample.c:2:
In function '__fortify_strcpy',
inlined from 'perf_trace_foo_rel_loc' at samples/trace_events/./trace-events-sample.h:519:1:
./include/linux/fortify-string.h:47:33: warning: '__builtin_strcpy' offset 12 is out of the bounds [
0, 4] [-Warray-bounds]
47 | #define __underlying_strcpy __builtin_strcpy
| ^
./include/linux/fortify-string.h:445:24: note: in expansion of macro '__underlying_strcpy'
445 | return __underlying_strcpy(p, q);
| ^~~~~~~~~~~~~~~~~~~
Also make __data struct member a proper flexible array to avoid future
problems.
Link: https://lkml.kernel.org/r/20220125220037.2738923-1-keescook@chromium.org Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 55de2c0b5610c ("tracing: Add '__rel_loc' using trace event macros") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 25 Jan 2022 14:19:30 +0000 (23:19 +0900)]
tracing: Avoid -Warray-bounds warning for __rel_loc macro
Since -Warray-bounds checks the destination size from the type of given
pointer, __assign_rel_str() macro gets warned because it passes the
pointer to the 'u32' field instead of 'trace_event_raw_*' data structure.
Pass the data address calculated from the 'trace_event_raw_*' instead of
'u32' __rel_loc field.
Link: https://lkml.kernel.org/r/20220125233154.dac280ed36944c0c2fe6f3ac@kernel.org Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ This did not fix the warning, but is still a nice clean up ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Tue, 25 Jan 2022 14:19:10 +0000 (09:19 -0500)]
ftrace: Have architectures opt-in for mcount build time sorting
First S390 complained that the sorting of the mcount sections at build
time caused the kernel to crash on their architecture. Now PowerPC is
complaining about it too. And also ARM64 appears to be having issues.
It may be necessary to also update the relocation table for the values
in the mcount table. Not only do we have to sort the table, but also
update the relocations that may be applied to the items in the table.
If the system is not relocatable, then it is fine to sort, but if it is,
some architectures may have issues (although x86 does not as it shifts all
addresses the same).
Add a HAVE_BUILDTIME_MCOUNT_SORT that an architecture can set to say it is
safe to do the sorting at build time.
Also update the config to compile in build time sorting in the sorttable
code in scripts/ to depend on CONFIG_BUILDTIME_MCOUNT_SORT.
Link: https://lore.kernel.org/all/944D10DA-8200-4BA9-8D0A-3BED9AA99F82@linux.ibm.com/ Link: https://lkml.kernel.org/r/20220127153821.3bc1ac6e@gandalf.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Yinan Liu <yinan@linux.alibaba.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Reported-by: Sachin Sant <sachinp@linux.ibm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Tested-by: Sachin Sant <sachinp@linux.ibm.com> Fixes: 72b3942a173c ("scripts: ftrace - move the sort-processing in ftrace_init") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cristian Marussi [Wed, 26 Jan 2022 10:27:19 +0000 (10:27 +0000)]
selftests: skip mincore.check_file_mmap when fs lacks needed support
Report mincore.check_file_mmap as SKIP instead of FAIL if the underlying
filesystem lacks support of O_TMPFILE or fallocate since such failures
are not really related to mincore functionality.
Muhammad Usama Anjum [Thu, 27 Jan 2022 17:44:46 +0000 (22:44 +0500)]
selftests: futex: Use variable MAKE instead of make
Recursive make commands should always use the variable MAKE, not the
explicit command name ‘make’. This has benefits and removes the
following warning when multiple jobs are used for the build:
make[2]: warning: jobserver unavailable: using -j1. Add '+' to parent make rule.
Fixes: a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Anshuman Khandual [Tue, 25 Jan 2022 14:20:34 +0000 (19:50 +0530)]
arm64: errata: Add detection for TRBE trace data corruption
TRBE implementations affected by Arm erratum #1902691 might corrupt trace
data or deadlock, when it's being written into the memory. So effectively
TRBE is broken and hence cannot be used to capture trace data. This adds
a new errata ARM64_ERRATUM_1902691 in arm64 errata framework.
Anshuman Khandual [Tue, 25 Jan 2022 14:20:33 +0000 (19:50 +0530)]
arm64: errata: Add detection for TRBE invalid prohibited states
TRBE implementations affected by Arm erratum #2038923 might get TRBE into
an inconsistent view on whether trace is prohibited within the CPU. As a
result, the trace buffer or trace buffer state might be corrupted. This
happens after TRBE buffer has been enabled by setting TRBLIMITR_EL1.E,
followed by just a single context synchronization event before execution
changes from a context, in which trace is prohibited to one where it isn't,
or vice versa. In these mentioned conditions, the view of whether trace is
prohibited is inconsistent between parts of the CPU, and the trace buffer
or the trace buffer state might be corrupted. This adds a new errata
ARM64_ERRATUM_2038923 in arm64 errata framework.
Anshuman Khandual [Tue, 25 Jan 2022 14:20:32 +0000 (19:50 +0530)]
arm64: errata: Add detection for TRBE ignored system register writes
TRBE implementations affected by Arm erratum #2064142 might fail to write
into certain system registers after the TRBE has been disabled. Under some
conditions after TRBE has been disabled, writes into certain TRBE registers
TRBLIMITR_EL1, TRBPTR_EL1, TRBBASER_EL1, TRBSR_EL1 and TRBTRG_EL1 will be
ignored and not be effected. This adds a new errata ARM64_ERRATUM_2064142
in arm64 errata framework.
Linus Torvalds [Thu, 27 Jan 2022 18:58:39 +0000 (20:58 +0200)]
Merge tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and can.
Current release - new code bugs:
- tcp: add a missing sk_defer_free_flush() in tcp_splice_read()
- tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n
- nf_tables: set last expression in register tracking area
- nft_connlimit: fix memleak if nf_ct_netns_get() fails
- mptcp: fix removing ids bitmap setting
- bonding: use rcu_dereference_rtnl when getting active slave
- fix three cases of sleep in atomic context in drivers: lan966x, gve
- handful of build fixes for esoteric drivers after netdev->dev_addr
was made const
Previous releases - regressions:
- revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
Linux compatibility with USGv6 tests
- procfs: show net device bound packet types
- ipv4: fix ip option filtering for locally generated fragments
- phy: broadcom: hook up soft_reset for BCM54616S
Previous releases - always broken:
- ipv4: raw: lock the socket in raw_bind()
- ipv4: decrease the use of shared IPID generator to decrease the
chance of attackers guessing the values
- procfs: fix cross-netns information leakage in /proc/net/ptype
- ethtool: fix link extended state for big endian
- bridge: vlan: fix single net device option dumping
- ping: fix the sk_bound_dev_if match in ping_lookup"
* tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
net: bridge: vlan: fix memory leak in __allowed_ingress
net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
ipv4: remove sparse error in ip_neigh_gw4()
ipv4: avoid using shared IP generator for connected sockets
ipv4: tcp: send zero IPID in SYNACK messages
ipv4: raw: lock the socket in raw_bind()
MAINTAINERS: add missing IPv4/IPv6 header paths
MAINTAINERS: add more files to eth PHY
net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
net: bridge: vlan: fix single net device option dumping
net: stmmac: skip only stmmac_ptp_register when resume from suspend
net: stmmac: configure PTP clock source prior to PTP initialization
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
connector/cn_proc: Use task_is_in_init_pid_ns()
pid: Introduce helper task_is_in_init_pid_ns()
gve: Fix GFP flags when allocing pages
net: lan966x: Fix sleep in atomic context when updating MAC table
net: lan966x: Fix sleep in atomic context when injecting frames
ethernet: seeq/ether3: don't write directly to netdev->dev_addr
ethernet: 8390/etherh: don't write directly to netdev->dev_addr
...
Jonathan Corbet [Fri, 21 Jan 2022 00:00:33 +0000 (17:00 -0700)]
docs: Hook the RTLA documents into the kernel docs build
The RTLA documents were added to Documentation/ but never hooked into the
rest of the docs build, leading to a bunch of warnings like:
Documentation/tools/rtla/rtla-osnoise.rst: WARNING: document isn't included in any toctree
Add some basic glue to wire these documents into the build so that they are
available with the rest of the rendered docs. No attempt has been made to
turn the RTLA docs into proper RST files rather than warmed-over man pages;
that is an exercise for the future.
Fixes: d40d48e1f1f2 ("rtla: Add Documentation") Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/877dau555q.fsf@meer.lwn.net Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Muhammad Usama Anjum [Thu, 27 Jan 2022 16:33:45 +0000 (21:33 +0500)]
selftests/exec: Remove pipe from TEST_GEN_FILES
pipe named FIFO special file is being created in execveat.c to perform
some tests. Makefile doesn't need to do anything with the pipe. When it
isn't found, Makefile generates the following build error:
make: *** No rule to make target
'../tools/testing/selftests/exec/pipe', needed by 'all'. Stop.
Yang Xu [Thu, 27 Jan 2022 09:11:37 +0000 (17:11 +0800)]
selftests/zram: Adapt the situation that /dev/zram0 is being used
If zram-generator package is installed and works, then we can not remove
zram module because zram swap is being used. This case needs a clean zram
environment, change this test by using hot_add/hot_remove interface. So
even zram device is being used, we still can add zram device and remove
them in cleanup.
The two interface was introduced since kernel commit 6566d1a32bf7("zram:
add dynamic device add/remove functionality") in v4.2-rc1. If kernel
supports these two interface, we use hot_add/hot_remove to slove this
problem, if not, just check whether zram is being used or built in, then
skip it on old kernel.
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Yang Xu [Thu, 27 Jan 2022 09:11:36 +0000 (17:11 +0800)]
selftests/zram01.sh: Fix compression ratio calculation
zram01 uses `free -m` to measure zram memory usage. The results are no
sense because they are polluted by all running processes on the system.
We Should only calculate the free memory delta for the current process.
So use the third field of /sys/block/zram<id>/mm_stat to measure memory
usage instead. The file is available since kernel 4.1.
orig_data_size(first): uncompressed size of data stored in this disk.
compr_data_size(second): compressed size of data stored in this disk
mem_used_total(third): the amount of memory allocated for this disk
Also remove useless zram cleanup call in zram_fill_fs and so we don't
need to cleanup zram twice if fails.
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Yang Xu [Thu, 27 Jan 2022 09:11:35 +0000 (17:11 +0800)]
selftests/zram: Skip max_comp_streams interface on newer kernel
Since commit 43209ea2d17a ("zram: remove max_comp_streams internals"), zram
has switched to per-cpu streams. Even kernel still keep this interface for
some reasons, but writing to max_comp_stream doesn't take any effect. So
skip it on newer kernel ie 4.7.
The code that comparing kernel version is from xfstests testsuite ext4/053.
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Shuah Khan [Wed, 26 Jan 2022 20:13:41 +0000 (13:13 -0700)]
docs/kselftest: clarify running mainline tests on stables
Update the document to clarifiy support for running mainline
kselftest on stable releases and the reasons for not removing
test code that can test older kernels.
Laibin Qiu [Thu, 27 Jan 2022 10:00:47 +0000 (18:00 +0800)]
blk-mq: Fix wrong wakeup batch configuration which will cause hang
Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be
awakened") will recalculate wake_batch when incrementing or decrementing
active_queues to avoid wake_batch > hctx_max_depth. At the same time, in
order to not affect performance as much as possible, the minimum wakeup
batch is set to 4. But when the QD is small (such as QD=1), if inc or dec
active_queues increases wakeup batch, that can lead to a hang:
Fix this problem with the following strategies:
QD : >= 32 | < 32
---------------------------------
wakeup batch: 8~4 | 3~1
Tim Yi [Thu, 27 Jan 2022 07:49:53 +0000 (15:49 +0800)]
net: bridge: vlan: fix memory leak in __allowed_ingress
When using per-vlan state, if vlan snooping and stats are disabled,
untagged or priority-tagged ingress frame will go to check pvid state.
If the port state is forwarding and the pvid state is not
learning/forwarding, untagged or priority-tagged frame will be dropped
but skb memory is not freed.
Should free skb when __allowed_ingress returns false.
Fixes: a580c76d534c ("net: bridge: vlan: add per-vlan state") Signed-off-by: Tim Yi <tim.yi@pica8.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20220127074953.12632-1-tim.yi@pica8.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso [Thu, 27 Jan 2022 12:40:38 +0000 (13:40 +0100)]
netfilter: nf_tables: remove assignment with no effect in chain blob builder
cppcheck possible warnings:
>> net/netfilter/nf_tables_api.c:2014:2: warning: Assignment of function parameter has no effect outside the function. Did you forget dereferencing it? [uselessAssignmentPtrArg]
ptr += offsetof(struct nft_rule_dp, data);
^
Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Menglong Dong [Thu, 27 Jan 2022 09:13:01 +0000 (17:13 +0800)]
net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
as the reason of skb drop out of socket filter before
it's part of a released kernel. It will be used for
more protocols than just TCP in future series.
Eric Dumazet [Thu, 27 Jan 2022 01:34:04 +0000 (17:34 -0800)]
ipv4: remove sparse error in ip_neigh_gw4()
./include/net/route.h:373:48: warning: incorrect type in argument 2 (different base types)
./include/net/route.h:373:48: expected unsigned int [usertype] key
./include/net/route.h:373:48: got restricted __be32 [usertype] daddr
Fixes: 5c9f7c1dfc2e ("ipv4: Add helpers for neigh lookup for nexthop") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220127013404.1279313-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ipv4: less uses of shared IP generator
From: Eric Dumazet <edumazet@google.com>
We keep receiving research reports based on linux IPID generation.
Before breaking part of the Internet by switching to pure
random generator, this series reduces the need for the
shared IP generator for TCP sockets.
====================
Eric Dumazet [Thu, 27 Jan 2022 01:10:22 +0000 (17:10 -0800)]
ipv4: avoid using shared IP generator for connected sockets
ip_select_ident_segs() has been very conservative about using
the connected socket private generator only for packets with IP_DF
set, claiming it was needed for some VJ compression implementations.
As mentioned in this referenced document, this can be abused.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
Before switching to pure random IPID generation and possibly hurt
some workloads, lets use the private inet socket generator.
Not only this will remove one vulnerability, this will also
improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT
Fixes: 73f156a6e8c1 ("inetpeer: get rid of ip_id_count") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reported-by: Ray Che <xijiache@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 27 Jan 2022 01:10:21 +0000 (17:10 -0800)]
ipv4: tcp: send zero IPID in SYNACK messages
In commit 431280eebed9 ("ipv4: tcp: send zero IPID for RST and
ACK sent in SYN-RECV and TIME-WAIT state") we took care of some
ctl packets sent by TCP.
It turns out we need to use a similar strategy for SYNACK packets.
By default, they carry IP_DF and IPID==0, but there are ways
to ask them to use the hashed IP ident generator and thus
be used to build off-path attacks.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
One of this way is to force (before listener is started)
echo 1 >/proc/sys/net/ipv4/ip_no_pmtu_disc
Another way is using forged ICMP ICMP_FRAG_NEEDED
with a very small MTU (like 68) to force a false return from
ip_dont_fragment()
In this patch, ip_build_and_send_pkt() uses the following
heuristics.
1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore
can use IP_DF regardless of the listener or route pmtu setting.
2) In case the SYNACK packet is bigger than IPV4_MIN_MTU,
we use prandom_u32() generator instead of the IPv4 hashed ident one.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ray Che <xijiache@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Geoff Alexander <alexandg@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>