1) Fix double ESP trailer insertion in IPsec crypto offload if
netif_xmit_frozen_or_stopped is true. From Huy Nguyen.
2) Merge fixup for "remove output_finish indirection from
xfrm_state_afinfo". From Stephen Rothwell.
3) Select CRYPTO_SEQIV for ESP as this is needed for GCM and several
other encryption algorithms. Also modernize the crypto algorithm
selections for ESP and AH, remove those that are maked as "MUST NOT"
and add those that are marked as "MUST" be implemented in RFC 8221.
From Eric Biggers.
Please note the merge conflict between commit:
a7f7f6248d97 ("treewide: replace '---help---' in Kconfig files with 'help'")
from Linus' tree and commits:
7d4e39195925 ("esp, ah: consolidate the crypto algorithm selections") be01369859b8 ("esp, ah: modernize the crypto algorithm selections")
from the ipsec tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jun 2020 19:55:12 +0000 (12:55 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2020-06-18
This series contains fixes to ixgbe, i40e and ice driver.
Ciara fixes up the ixgbe, i40e and ice drivers to protect access when
allocating and freeing the rings. In addition, made use of READ_ONCE
when reading the rings prior to accessing the statistics pointer.
Björn fixes a crash where the receive descriptor ring allocation was
moved to a different function, which broke the ethtool set_ringparam()
hook.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 19 Jun 2020 19:40:57 +0000 (12:40 -0700)]
Merge tag 'drm-fixes-2020-06-19' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Just i915 and amd here.
i915 has some workaround movement so they get applied at the right
times, and a timeslicing fix, along with some display fixes.
AMD has a few display floating point fix and a devcgroup fix for
amdkfd.
i915:
- Fix for timeslicing and virtual engines/unpremptable requests (+ 1
dependency patch)
- Fixes into TypeC register programming and interrupt storm detecting
- Disable DIP on MST ports with the transcoder clock still on
- Avoid missing GT workarounds at reset for HSW and older gens
- Fix for unwinding multiple requests missing force restore
- Fix encoder type check for DDI vswing sequence
- Build warning fixes
amdgpu:
- Fix kvfree/kfree mixup
- Fix hawaii device id in powertune configuration
- Display FP fixes
- Documentation fixes
amdkfd:
- devcgroup check fix"
* tag 'drm-fixes-2020-06-19' of git://anongit.freedesktop.org/drm/drm: (23 commits)
drm/amdgpu: fix documentation around busy_percentage
drm/amdgpu/pm: update comment to clarify Overdrive interfaces
drm/amdkfd: Use correct major in devcgroup check
drm/i915/display: Fix the encoder type check
drm/i915/icl+: Fix hotplug interrupt disabling after storm detection
drm/i915/gt: Move gen4 GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move ilk GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move snb GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move vlv GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move ivb GT workarounds from init_clock_gating to workarounds
drm/i915/gt: Move hsw GT workarounds from init_clock_gating to workarounds
drm/i915/icl: Disable DIP on MST ports with the transcoder clock still on
drm/i915/gt: Incrementally check for rewinding
drm/i915/tc: fix the reset of ln0
drm/i915/gt: Prevent timeslicing into unpreemptable requests
drm/i915/selftests: Restore to default heartbeat
drm/i915: work around false-positive maybe-uninitialized warning
drm/i915/pmu: avoid an maybe-uninitialized warning
drm/i915/gt: Incorporate the virtual engine into timeslicing
drm/amd/display: Rework dsc to isolate FPU operations
...
Linus Torvalds [Fri, 19 Jun 2020 19:25:04 +0000 (12:25 -0700)]
Merge tag 'ceph-for-5.8-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"An important follow-up for replica reads support that went into -rc1
and two target_copy() fixups"
* tag 'ceph-for-5.8-rc2' of git://github.com/ceph/ceph-client:
libceph: don't omit used_replica in target_copy()
libceph: don't omit recovery_deletes in target_copy()
libceph: move away from global osd_req_flags
Linus Torvalds [Fri, 19 Jun 2020 19:19:12 +0000 (12:19 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Unfortunately, we still have a number of outstanding issues so there
will be more fixes to come, but this lot are a good start.
- Fix handling of watchpoints triggered by uaccess routines
- Fix initialisation of gigantic pages for CMA buffers
- Raise minimum clang version for BTI to avoid miscompilation
- Fix data race in SVE vector length configuration code
- Ensure address tags are ignored in kern_addr_valid()
- Dump register state on fatal BTI exception
- kexec_file() cleanup to use struct_size() macro"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints
arm64: kexec_file: Use struct_size() in kmalloc()
arm64: mm: reserve hugetlb CMA after numa_init
arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support
arm64: sve: Fix build failure when ARM64_SVE=y and SYSCTL=n
arm64: pgtable: Clear the GP bit for non-executable kernel pages
arm64: mm: reset address tag set by kasan sw tagging
arm64: traps: Dump registers prior to panic() in bad_mode()
arm64/sve: Eliminate data races on sve_default_vl
docs/arm64: Fix typo'd #define in sve.rst
arm64: remove TEXT_OFFSET randomization
Linus Torvalds [Fri, 19 Jun 2020 18:45:03 +0000 (11:45 -0700)]
Merge tag 'overflow-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull flex-array size helper from Kees Cook:
"During the treewide clean-ups of zero-length "flexible arrays", the
struct_size() helper was heavily used, but it was noticed that many
times it would have been nice to have an additional helper to get the
size of just the flexible array itself.
This need appears to be even more common when cleaning up the 1-byte
array "flexible arrays", so Gustavo implemented it.
I'd love to get this landed early so it can be used during the v5.9
dev cycle to ease the 1-byte array cleanups."
* tag 'overflow-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
overflow.h: Add flex_array_size() helper
Linus Torvalds [Fri, 19 Jun 2020 18:39:57 +0000 (11:39 -0700)]
Merge tag 'perf-tools-fixes-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tooling fixes from Arnaldo Carvalho de Melo:
- Update various UAPI headers, some automatically adding support for a
new MSR and the faccess2 syscall.
- Fix corner case NULL deref in the histograms code.
- Fix corner case NULL deref in 'perf stat' aggregation code.
- Fix array pointer deref and old style declaration in the parsing of
events.
- Fix segfault when processing ZSTD compressed perf.data files in 'perf
script' due to lack of initialization of the ZSTD library.
- Handle __attribute__((user)) in libtraceevent fixing the parsing of
syscall tracepoints with user buffers.
- Make libtraevent aware of __builtin_expect() appearing in tracepoint
fields.
- Make the BPF prologue generation use bpf_probe_read_{user,kernel}().
- Fix the '@user' attribute parsing in kprobes variables in 'perf
probe'.
- Fix error message when asking for -fsanitize=address without required
libraries.
* tag 'perf-tools-fixes-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (22 commits)
perf build: Fix error message when asking for -fsanitize=address without required libraries
tools lib traceevent: Add handler for __builtin_expect()
tools lib traceevent: Handle __attribute__((user)) in field names
tools lib traceevent: Add append() function helper for appending strings
tools headers UAPI: Sync linux/fs.h with the kernel sources
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf script: Initialize zstd_data
perf pmu: Remove unused declaration
perf parse-events: Fix an old style declaration
perf parse-events: Fix an incompatible pointer
perf bpf: Fix bpf prologue generation
perf probe: Fix user attribute access in kprobes
perf stat: Fix NULL pointer dereference
perf report: Fix NULL pointer dereference in hists__fprintf_nr_sample_events()
tools headers UAPI: Sync kvm.h headers with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
perf beauty: Add support to STATX_MNT_ID in the 'statx' syscall 'mask' argument
tools headers uapi: Sync linux/stat.h with the kernel sources
...
Fan Guo [Fri, 12 Jun 2020 06:38:24 +0000 (14:38 +0800)]
RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads()
If ib_dma_mapping_error() returns non-zero value,
ib_mad_post_receive_mads() will jump out of loops and return -ENOMEM
without freeing mad_priv. Fix this memory-leak problem by freeing mad_priv
in this case.
Call traces are different but the crash is imminent. The problem was
blindly bisected to the commit 041bc42ce2d0 ("KVM: VMX: Micro-optimize
vmexit time when not exposing PMU"). It was also confirmed that the
issue goes away if PMU is exposed to the guest.
With some instrumentation of the guest we can see what is being switched
(when we do atomic_switch_perf_msrs()):
The current guess is that PEBS (MSR_IA32_PEBS_ENABLE, 0x3f1) is to blame.
Regardless of whether PMU is exposed to the guest or not, PEBS needs to
be disabled upon switch.
Wolfram Sang [Mon, 15 Jun 2020 07:58:13 +0000 (09:58 +0200)]
video: backlight: tosa_lcd: convert to use i2c_new_client_device()
Move away from the deprecated API and return the shiny new ERRPTR where
useful.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Wolfram Sang [Mon, 15 Jun 2020 07:58:12 +0000 (09:58 +0200)]
x86/platform/intel-mid: convert to use i2c_new_client_device()
Move away from the deprecated API and return the shiny new ERRPTR where
useful.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Wolfram Sang [Mon, 15 Jun 2020 07:58:11 +0000 (09:58 +0200)]
drm: encoder_slave: use new I2C API
i2c_new_client() is deprecated, use the replacement
i2c_new_client_device(). Also, we have a helper to check if a driver is
bound. Use it to simplify the code. Note that this changes the errno for
a failed device creation from ENOMEM to ENODEV. No callers currently
interpret this errno, though, so we use this condensed error check.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Björn Töpel [Fri, 12 Jun 2020 11:47:31 +0000 (13:47 +0200)]
i40e: fix crash when Rx descriptor count is changed
When the AF_XDP buffer allocator was introduced, the Rx SW ring
"rx_bi" allocation was moved from i40e_setup_rx_descriptors()
function, and was instead done in the i40e_configure_rx_ring()
function.
This broke the ethtool set_ringparam() hook for changing the Rx
descriptor count, which was relying on i40e_setup_rx_descriptors() to
handle the allocation.
Fix this by adding an explicit i40e_alloc_rx_bi() call to
i40e_set_ringparam().
Fixes: be1222b585fd ("i40e: Separate kernel allocated rx_bi rings from AF_XDP rings") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ciara Loftus [Tue, 9 Jun 2020 13:19:45 +0000 (13:19 +0000)]
ice: protect ring accesses with WRITE_ONCE
The READ_ONCE macro is used when reading rings prior to accessing the
statistics pointer. The corresponding WRITE_ONCE usage when allocating and
freeing the rings to ensure protected access was not in place. Introduce
this.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ciara Loftus [Tue, 9 Jun 2020 13:19:44 +0000 (13:19 +0000)]
i40e: protect ring accesses with READ- and WRITE_ONCE
READ_ONCE should be used when reading rings prior to accessing the
statistics pointer. Introduce this as well as the corresponding WRITE_ONCE
usage when allocating and freeing the rings, to ensure protected access.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ciara Loftus [Tue, 9 Jun 2020 13:19:43 +0000 (13:19 +0000)]
ixgbe: protect ring accesses with READ- and WRITE_ONCE
READ_ONCE should be used when reading rings prior to accessing the
statistics pointer. Introduce this as well as the corresponding WRITE_ONCE
usage when allocating and freeing the rings, to ensure protected access.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eric Dumazet [Thu, 18 Jun 2020 05:23:25 +0000 (22:23 -0700)]
net: increment xmit_recursion level in dev_direct_xmit()
Back in commit f60e5990d9c1 ("ipv6: protect skb->sk accesses
from recursive dereference inside the stack") Hannes added code
so that IPv6 stack would not trust skb->sk for typical cases
where packet goes through 'standard' xmit path (__dev_queue_xmit())
Alas af_packet had a dev_direct_xmit() path that was not
dealing yet with xmit_recursion level.
Also change sk_mc_loop() to dump a stack once only.
f60e5990d9c1 ("ipv6: protect skb->sk accesses from recursive dereference inside the stack") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 18 Jun 2020 03:42:44 +0000 (20:42 -0700)]
net: dsa: bcm_sf2: Fix node reference count
of_find_node_by_name() will do an of_node_put() on the "from" argument.
With CONFIG_OF_DYNAMIC enabled which checks for device_node reference
counts, we would be getting a warning like this:
Fix this by adding a of_node_get() to increment the reference count
prior to the call.
Fixes: afa3b592953b ("net: dsa: bcm_sf2: Ensure correct sub-node is parsed") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3b33583265ed ("net: Add fraglist GRO/GSO feature flags") missed
an entry for NETIF_F_GSO_FRAGLIST in netdev_features_strings array. As
a result, fraglist GSO feature is not shown in 'ethtool -k' output and
can't be toggled on/off.
The fix is trivial.
Fixes: 3b33583265ed ("net: Add fraglist GRO/GSO feature flags") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
David Christensen [Wed, 17 Jun 2020 18:51:17 +0000 (11:51 -0700)]
tg3: driver sleeps indefinitely when EEH errors exceed eeh_max_freezes
The driver function tg3_io_error_detected() calls napi_disable twice,
without an intervening napi_enable, when the number of EEH errors exceeds
eeh_max_freezes, resulting in an indefinite sleep while holding rtnl_lock.
Add check for pcierr_recovery which skips code already executed for the
"Frozen" state.
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin [Wed, 17 Jun 2020 17:00:23 +0000 (22:30 +0530)]
bareudp: Fixed multiproto mode configuration
Code to handle multiproto configuration is missing.
Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS") Signed-off-by: Martin <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jun 2020 03:27:42 +0000 (20:27 -0700)]
Merge branch 's390-qeth-fixes'
Julian Wiedmann says:
====================
s390/qeth: fixes 2020-06-17
please apply the following patch series for qeth to netdev's net tree.
The first patch fixes a regression in the error handling for a specific
cmd type. I have some follow-ups queued up for net-next to clean this
up properly...
The second patch fine-tunes the HW offload restrictions that went in
with this merge window. In some setups we don't need to apply them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 17 Jun 2020 14:54:53 +0000 (16:54 +0200)]
s390/qeth: let isolation mode override HW offload restrictions
When a device is configured with ISOLATION_MODE_FWD, traffic never goes
through the internal switch. Don't apply the offload restrictions in
this case.
Fixes: c619e9a6f52f ("s390/qeth: don't use restricted offloads for local traffic") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 17 Jun 2020 14:54:52 +0000 (16:54 +0200)]
s390/qeth: fix error handling for isolation mode cmds
Current(?) OSA devices also store their cmd-specific return codes for
SET_ACCESS_CONTROL cmds into the top-level cmd->hdr.return_code.
So once we added stricter checking for the top-level field a while ago,
none of the error logic that rolls back the user's configuration to its
old state is applied any longer.
For this specific cmd, go back to the old model where we peek into the
cmd structure even though the top-level field indicated an error.
Fixes: 686c97ee29c8 ("s390/qeth: fix error handling in adapter command callbacks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mptcp: cope with syncookie on MP_JOINs
Currently syncookies on MP_JOIN connections are not handled correctly: the
connections fallback to TCP and are kept alive instead of resetting them at
fallback time.
The first patch propagates the required information up to syn_recv_sock time,
and the 2nd patch addresses the unifying the error path for all MP_JOIN
requests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 17 Jun 2020 10:08:57 +0000 (12:08 +0200)]
mptcp: drop MP_JOIN request sock on syn cookies
Currently any MPTCP socket using syn cookies will fallback to
TCP at 3rd ack time. In case of MP_JOIN requests, the RFC mandate
closing the child and sockets, but the existing error paths
do not handle the syncookie scenario correctly.
Address the issue always forcing the child shutdown in case of
MP_JOIN fallback.
Fixes: ae2dd7164943 ("mptcp: handle tcp fallback when using syn cookies") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 17 Jun 2020 10:08:56 +0000 (12:08 +0200)]
mptcp: cache msk on MP_JOIN init_req
The msk ownership is transferred to the child socket at
3rd ack time, so that we avoid more lookups later. If the
request does not reach the 3rd ack, the MSK reference is
dropped at request sock release time.
As a side effect, fallback is now tracked by a NULL msk
reference instead of zeroed 'mp_join' field. This will
simplify the next patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The arp request address is error, this is because fib_table_lookup in
fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.
Fixes: 3bfd847203c6 ("net: Use passed in table for nexthop lookups") Signed-off-by: guodeqing <geffrey.guo@huawei.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jun 2020 03:20:46 +0000 (20:20 -0700)]
Merge branch 'sja1105-fixes'
Vladimir Oltean says:
====================
Fix VLAN checks for SJA1105 DSA tc-flower filters
This fixes a ridiculous situation where the driver, in VLAN-unaware
mode, would refuse accepting any tc filter:
tc filter replace dev sw1p3 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate (...)
Error: sja1105: Can only gate based on {DMAC, VID, PCP}.
tc filter replace dev sw1p3 ingress protocol 802.1Q flower skip_sw \
vlan_id 1 vlan_prio 0 dst_mac 42:be:24:9b:76:20 \
action gate (...)
Error: sja1105: Can only gate based on DMAC.
So, without changing the VLAN awareness state, it says it doesn't want
VLAN-aware rules, and it doesn't want VLAN-unaware rules either. One
would say it's in Schrodinger's state...
Now, the situation has been made worse by commit 7f14937facdc ("net:
dsa: sja1105: keep the VLAN awareness state in a driver variable"),
which made VLAN awareness a ternary attribute, but after inspecting the
code from before that patch with a truth table, it looks like the
logical bug was there even before.
While attempting to fix this, I also noticed some leftover debugging
code in one of the places that needed to be fixed. It would have
appeared in the context of patch 3/3 anyway, so I decided to create a
patch that removes it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 16 Jun 2020 23:58:43 +0000 (02:58 +0300)]
net: dsa: sja1105: fix checks for VLAN state in gate action
This action requires the VLAN awareness state of the switch to be of the
same type as the key that's being added:
- If the switch is unaware of VLAN, then the tc filter key must only
contain the destination MAC address.
- If the switch is VLAN-aware, the key must also contain the VLAN ID and
PCP.
But this check doesn't work unless we verify the VLAN awareness state on
both the "if" and the "else" branches.
Fixes: 834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 16 Jun 2020 23:58:42 +0000 (02:58 +0300)]
net: dsa: sja1105: fix checks for VLAN state in redirect action
This action requires the VLAN awareness state of the switch to be of the
same type as the key that's being added:
- If the switch is unaware of VLAN, then the tc filter key must only
contain the destination MAC address.
- If the switch is VLAN-aware, the key must also contain the VLAN ID and
PCP.
But this check doesn't work unless we verify the VLAN awareness state on
both the "if" and the "else" branches.
Fixes: dfacc5a23e22 ("net: dsa: sja1105: support flow-based redirection via virtual links") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 16 Jun 2020 23:58:41 +0000 (02:58 +0300)]
net: dsa: sja1105: remove debugging code in sja1105_vl_gate
This shouldn't be there.
Fixes: 834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Jun 2020 03:17:49 +0000 (20:17 -0700)]
Merge branch 'act_gate-fixes'
Davide Caratti says:
====================
two fixes for 'act_gate' control plane
- patch 1/2 attempts to fix the error path of tcf_gate_init() when users
try to configure 'act_gate' rules with wrong parameters
- patch 2/2 is a follow-up of a recent fix for NULL dereference in
the error path of tcf_gate_init()
further work will introduce a tdc test for 'act_gate'.
changes since v2:
- fix undefined behavior in patch 1/2
- improve comment in patch 2/2
changes since v1:
coding style fixes in patch 1/2 and 2/2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Tue, 16 Jun 2020 20:25:21 +0000 (22:25 +0200)]
net/sched: act_gate: fix configuration of the periodic timer
assigning a dummy value of 'clock_id' to avoid cancellation of the cycle
timer before its initialization was a temporary solution, and we still
need to handle the case where act_gate timer parameters are changed by
commands like the following one:
# tc action replace action gate <parameters>
the fix consists in the following items:
1) remove the workaround assignment of 'clock_id', and init the list of
entries before the first error path after IDR atomic check/allocation
2) validate 'clock_id' earlier: there is no need to do IDR atomic
check/allocation if we know that 'clock_id' is a bad value
3) use a dedicated function, 'gate_setup_timer()', to ensure that the
timer is cancelled and re-initialized on action overwrite, and also
ensure we initialize the timer in the error path of tcf_gate_init()
v3: improve comment in the error path of tcf_gate_init() (thanks to
Vladimir Oltean)
v2: avoid 'goto' in gate_setup_timer (thanks to Cong Wang)
CC: Ivan Vecera <ivecera@redhat.com> Fixes: a01c245438c5 ("net/sched: fix a couple of splats in the error path of tfc_gate_init()") Fixes: a51c328df310 ("net: qos: introduce a gate control flow action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
this is caused by the test on 'cycletime_ext', that is still unassigned
when the action is newly created. This makes the action .init() return 0
without calling tcf_idr_insert(), hence the UAF + crash.
rework the logic that prevents zero values of cycle-time, as follows:
1) 'tcfg_cycletime_ext' seems to be unused in the action software path,
and it was already possible by other means to obtain non-zero
cycletime and zero cycletime-ext. So, removing that test should not
cause any damage.
2) while at it, we must prevent overwriting configuration data with wrong
ones: use a temporary variable for 'tcfg_cycletime', and validate it
preserving the original semantic (that allowed computing the cycle
time as the sum of all intervals, when not specified by
TCA_GATE_CYCLE_TIME).
3) remove the test on 'tcfg_cycletime', no more useful, and avoid
returning -EFAULT, which did not seem an appropriate return value for
a wrong netlink attribute.
v3: fix uninitialized 'cycletime' (thanks to Vladimir Oltean)
v2: remove useless 'return;' at the end of void gate_get_start_time()
Fixes: a51c328df310 ("net: qos: introduce a gate control flow action") CC: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Tue, 16 Jun 2020 16:51:51 +0000 (16:51 +0000)]
ip_tunnel: fix use-after-free in ip_tunnel_lookup()
In the datapath, the ip_tunnel_lookup() is used and it internally uses
fallback tunnel device pointer, which is fb_tunnel_dev.
This pointer variable should be set to NULL when a fb interface is deleted.
But there is no routine to set fb_tunnel_dev pointer to NULL.
So, this pointer will be still used after interface is deleted and
it eventually results in the use-after-free problem.
Test commands:
ip netns add A
ip netns add B
ip link add eth0 type veth peer name eth1
ip link set eth0 netns A
ip link set eth1 netns B
ip netns exec A ip link set lo up
ip netns exec A ip link set eth0 up
ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
remote 10.0.0.2
ip netns exec A ip link set gre1 up
ip netns exec A ip a a 10.0.100.1/24 dev gre1
ip netns exec A ip a a 10.0.0.1/24 dev eth0
ip netns exec B ip link set lo up
ip netns exec B ip link set eth1 up
ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
remote 10.0.0.1
ip netns exec B ip link set gre1 up
ip netns exec B ip a a 10.0.100.2/24 dev gre1
ip netns exec B ip a a 10.0.0.2/24 dev eth1
ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
ip netns del B
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Tue, 16 Jun 2020 16:04:00 +0000 (16:04 +0000)]
ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()
In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses
fallback tunnel device pointer, which is fb_tunnel_dev.
This pointer variable should be set to NULL when a fb interface is deleted.
But there is no routine to set fb_tunnel_dev pointer to NULL.
So, this pointer will be still used after interface is deleted and
it eventually results in the use-after-free problem.
Test commands:
ip netns add A
ip netns add B
ip link add eth0 type veth peer name eth1
ip link set eth0 netns A
ip link set eth1 netns B
ip netns exec A ip link set lo up
ip netns exec A ip link set eth0 up
ip netns exec A ip link add ip6gre1 type ip6gre local fc:0::1 \
remote fc:0::2
ip netns exec A ip -6 a a fc:100::1/64 dev ip6gre1
ip netns exec A ip link set ip6gre1 up
ip netns exec A ip -6 a a fc:0::1/64 dev eth0
ip netns exec A ip link set ip6gre0 up
ip netns exec B ip link set lo up
ip netns exec B ip link set eth1 up
ip netns exec B ip link add ip6gre1 type ip6gre local fc:0::2 \
remote fc:0::1
ip netns exec B ip -6 a a fc:100::2/64 dev ip6gre1
ip netns exec B ip link set ip6gre1 up
ip netns exec B ip -6 a a fc:0::2/64 dev eth1
ip netns exec B ip link set ip6gre0 up
ip netns exec A ping fc:100::2 -s 60000 &
ip netns del B
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Tue, 16 Jun 2020 15:52:05 +0000 (15:52 +0000)]
net: core: reduce recursion limit value
In the current code, ->ndo_start_xmit() can be executed recursively only
10 times because of stack memory.
But, in the case of the vxlan, 10 recursion limit value results in
a stack overflow.
In the current code, the nested interface is limited by 8 depth.
There is no critical reason that the recursion limitation value should
be 10.
So, it would be good to be the same value with the limitation value of
nesting interface depth.
Test commands:
ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
ip link set vxlan10 up
ip a a 192.168.10.1/24 dev vxlan10
ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent
for i in {9..0}
do
let A=$i+1
ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
ip link set vxlan$i up
ip a a 192.168.$i.1/24 dev vxlan$i
ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
done
hping3 192.168.10.2 -2 -d 60000
Fixes: 11a766ce915f ("net: Increase xmit RECURSION_LIMIT to 10.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If call_netdevice_notifiers() failed, then rollback_registered()
calls netdev_unregister_kobject() which holds the kobject. The
reference cannot be put because the netdev won't be add to todo
list, so it will leads a memleak, we need put the reference to
avoid memleak.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sascha Hauer [Tue, 16 Jun 2020 08:31:40 +0000 (10:31 +0200)]
net: ethernet: mvneta: Add 2500BaseX support for SoCs without comphy
The older SoCs like Armada XP support a 2500BaseX mode in the datasheets
referred to as DR-SGMII (Double rated SGMII) or HS-SGMII (High Speed
SGMII). This is an upclocked 1000BaseX mode, thus
PHY_INTERFACE_MODE_2500BASEX is the appropriate mode define for it.
adding support for it merely means writing the correct magic value into
the MVNETA_SERDES_CFG register.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Sascha Hauer [Tue, 16 Jun 2020 08:31:39 +0000 (10:31 +0200)]
net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy
The MVNETA_SERDES_CFG register is only available on older SoCs like the
Armada XP. On newer SoCs like the Armada 38x the fields are moved to
comphy. This patch moves the writes to this register next to the comphy
initialization, so that depending on the SoC either comphy or
MVNETA_SERDES_CFG is configured.
With this we no longer write to the MVNETA_SERDES_CFG on SoCs where it
doesn't exist.
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Atish Patra [Wed, 17 Jun 2020 20:37:32 +0000 (13:37 -0700)]
RISC-V: Acquire mmap lock before invoking walk_page_range
As per walk_page_range documentation, mmap lock should be acquired by the
caller before invoking walk_page_range. mmap_assert_locked gets triggered
without that. The details can be found here.
Yash Shah [Tue, 16 Jun 2020 14:03:06 +0000 (19:33 +0530)]
RISC-V: Don't allow write+exec only page mapping request in mmap
As per the table 4.4 of version "20190608-Priv-MSU-Ratified" of the
RISC-V instruction set manual[0], the PTE permission bit combination of
"write+exec only" is reserved for future use. Hence, don't allow such
mapping request in mmap call.
An issue is been reported by David Abdurachmanov, that while running
stress-ng with "sysbadaddr" argument, RCU stalls are observed on RISC-V
specific kernel.
This issue arises when the stress-sysbadaddr request for pages with
"write+exec only" permission bits and then passes the address obtain
from this mmap call to various system call. For the riscv kernel, the
mmap call should fail for this particular combination of permission bits
since it's not valid.
Signed-off-by: Yash Shah <yash.shah@sifive.com> Reported-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
[Palmer: Refer to the latest ISA specification at the only link I could
find, and update the terminology.] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Dave Airlie [Thu, 18 Jun 2020 23:45:47 +0000 (09:45 +1000)]
Merge tag 'drm-intel-fixes-2020-06-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix for timeslicing and virtual engines/unpremptable requests
(+ 1 dependency patch)
- Fixes into TypeC register programming and interrupt storm detecting
- Disable DIP on MST ports with the transcoder clock still on
- Avoid missing GT workarounds at reset for HSW and older gens
- Fix for unwinding multiple requests missing force restore
- Fix encoder type check for DDI vswing sequence
- Build warning fixes
Linus Torvalds [Thu, 18 Jun 2020 19:35:51 +0000 (12:35 -0700)]
Merge branch 'hch' (maccess patches from Christoph Hellwig)
Merge non-faulting memory access cleanups from Christoph Hellwig:
"Andrew and I decided to drop the patches implementing your suggested
rename of the probe_kernel_* and probe_user_* helpers from -mm as
there were way to many conflicts.
After -rc1 might be a good time for this as all the conflicts are
resolved now"
This also adds a type safety checking patch on top of the renaming
series to make the subtle behavioral difference between 'get_user()' and
'get_kernel_nofault()' less potentially dangerous and surprising.
* emailed patches from Christoph Hellwig <hch@lst.de>:
maccess: make get_kernel_nofault() check for minimal type compatibility
maccess: rename probe_kernel_address to get_kernel_nofault
maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault
maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
Linus Torvalds [Thu, 18 Jun 2020 19:10:37 +0000 (12:10 -0700)]
maccess: make get_kernel_nofault() check for minimal type compatibility
Now that we've renamed probe_kernel_address() to get_kernel_nofault()
and made it look and behave more in line with get_user(), some of the
subtle type behavior differences end up being more obvious and possibly
dangerous.
When you do
get_user(val, user_ptr);
the type of the access comes from the "user_ptr" part, and the above
basically acts as
val = *user_ptr;
by design (except, of course, for the fact that the actual dereference
is done with a user access).
Note how in the above case, the type of the end result comes from the
pointer argument, and then the value is cast to the type of 'val' as
part of the assignment.
So the type of the pointer is ultimately the more important type both
for the access itself.
But 'get_kernel_nofault()' may now _look_ similar, but it behaves very
differently. When you do
get_kernel_nofault(val, kernel_ptr);
it behaves like
val = *(typeof(val) *)kernel_ptr;
except, of course, for the fact that the actual dereference is done with
exception handling so that a faulting access is suppressed and returned
as the error code.
But note how different the casting behavior of the two superficially
similar accesses are: one does the actual access in the size of the type
the pointer points to, while the other does the access in the size of
the target, and ignores the pointer type entirely.
Actually changing get_kernel_nofault() to act like get_user() is almost
certainly the right thing to do eventually, but in the meantime this
patch adds logit to at least verify that the pointer type is compatible
with the type of the result.
In many cases, this involves just casting the pointer to 'void *' to
make it obvious that the type of the pointer is not the important part.
It's not how 'get_user()' acts, but at least the behavioral difference
is now obvious and explicit.
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Max Gurtovoy [Wed, 17 Jun 2020 13:02:30 +0000 (16:02 +0300)]
RDMA/mlx5: Fix integrity enabled QP creation
create_flags checks was refactored and broke the creation on integrity
enabled QPs and actually broke the NVMe/RDMA and iSER ULP's when using
mlx5 driven devices.
Fixes: 2978975ce7f1 ("RDMA/mlx5: Process create QP flags in one place") Link: https://lore.kernel.org/r/20200617130230.2846915-1-leon@kernel.org Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Thu, 18 Jun 2020 11:25:06 +0000 (14:25 +0300)]
RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs
Like any other QP type, rely on FW for the RAW_PACKET QPs to decide if ECE
is supported or not. This fixes an inability to create RAW_PACKET QPs with
latest rdma-core with the ECE support.
Leon Romanovsky [Wed, 17 Jun 2020 13:01:48 +0000 (16:01 +0300)]
RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path
destroy_qp_common is called for flows where QP is already created by
HW. While it is called from IB/core, the ibqp.* fields will be fully
initialized, but it is not the case if this function is called during QP
creation.
Don't rely on ibqp fields as much as possible and initialize
send_cq/recv_cq as temporal solution till all drivers will be converted to
IB/core QP allocation scheme.
refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 5372 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 5372 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
mlx5_core_put_rsc+0x70/0x80
destroy_resource_common+0x8e/0xb0
mlx5_core_destroy_qp+0xaf/0x1d0
mlx5_ib_destroy_qp+0xeb0/0x1460
ib_destroy_qp_user+0x2d5/0x7d0
create_qp+0xed3/0x2130
ib_uverbs_create_qp+0x13e/0x190
? ib_uverbs_ex_create_qp
ib_uverbs_write+0xaa5/0xdf0
__vfs_write+0x7c/0x100
ksys_write+0xc8/0x200
do_syscall_64+0x9c/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Luc Van Oostenryck [Wed, 17 Jun 2020 22:02:26 +0000 (00:02 +0200)]
sparse: use identifiers to define address spaces
Currently, address spaces in warnings are displayed as '<asn:X>' with
'X' being the address space's arbitrary number.
But since sparse v0.6.0-rc1 (late December 2018), sparse allows you to
define the address spaces using an identifier instead of a number. This
identifier is then directly used in the warnings.
So, use the identifiers '__user', '__iomem', '__percpu' & '__rcu' for
the corresponding address spaces. The default address space, __kernel,
being not displayed in warnings, stays defined as '0'.
With this change, warnings that used to be displayed as:
cast removes address space '<asn:1>' of expression
... void [noderef] <asn:2> *
will now be displayed as:
cast removes address space '__user' of expression
... void [noderef] __iomem *
This also moves the __kernel annotation to be the first one, since it is
quite different from the others because it's the default one, and so:
- it's never displayed
- it's normally not needed, nor in type annotations, nor in cast
between address spaces. The only time it's needed is when it's
combined with a typeof to express "the same type as this one but
without the address space"
- it can't be defined with a name, '0' must be used.
So, it seemed strange to me to have it in the middle of the other
ones.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhenzhong Duan [Thu, 18 Jun 2020 03:21:25 +0000 (11:21 +0800)]
spi: spidev: fix a potential use-after-free in spidev_release()
If an spi device is unbounded from the driver before the release
process, there will be an NULL pointer reference when it's
referenced in spi_slave_abort().
Fix it by checking it's already freed before reference.
Qiushi Wu [Sat, 13 Jun 2020 20:51:58 +0000 (15:51 -0500)]
ASoC: rockchip: Fix a reference count leak.
Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count if pm_runtime_put is not called in
error handling paths. Call pm_runtime_put if pm_runtime_get_sync fails.
Fixes: fc05a5b22253 ("ASoC: rockchip: add support for pdm controller") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20200613205158.27296-1-wu000273@umn.edu Signed-off-by: Mark Brown <broonie@kernel.org>
Zheng Bin [Thu, 18 Jun 2020 04:21:37 +0000 (12:21 +0800)]
loop: replace kill_bdev with invalidate_bdev
When a filesystem is mounted on a loop device and on a loop ioctl
LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting
destroyed.
kill_bdev
truncate_inode_pages
truncate_inode_pages_range
do_invalidatepage
block_invalidatepage
discard_buffer -->clear BH_Mapped flag
sb_bread
__bread_gfp
bh = __getblk_gfp
-->discard_buffer clear BH_Mapped flag
__bread_slow
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON
Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kai-Heng Feng [Wed, 3 Jun 2020 07:48:19 +0000 (15:48 +0800)]
libata: Use per port sync for detach
Commit 130f4caf145c ("libata: Ensure ata_port probe has completed before
detach") may cause system freeze during suspend.
Using async_synchronize_full() in PM callbacks is wrong, since async
callbacks that are already scheduled may wait for not-yet-scheduled
callbacks, causes a circular dependency.
Instead of using big hammer like async_synchronize_full(), use async
cookie to make sure port probe are synced, without affecting other
scheduled PM callbacks.
Fixes: 130f4caf145c ("libata: Ensure ata_port probe has completed before detach") Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: John Garry <john.garry@huawei.com> BugLink: https://bugs.launchpad.net/bugs/1867983 Signed-off-by: Jens Axboe <axboe@kernel.dk>
But clearing REQ_F_NEED_CLEANUP might be unsafe. The io request may
already have been completed, and then io_complete_rw_iopoll()
and io_complete_rw() will be called, both of which will also modify
req->flags if needed. This causes a race condition, with concurrent
non-atomic modification of req->flags.
To eliminate this race, in io_read() or io_write(), if io request is
submitted successfully, we don't remove REQ_F_NEED_CLEANUP flag. If
REQ_F_NEED_CLEANUP is set, we'll leave __io_req_aux_free() to the
iovec cleanup work correspondingly.
Cc: stable@vger.kernel.org Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yangyang Li [Tue, 16 Jun 2020 13:39:38 +0000 (21:39 +0800)]
RDMA/hns: Fix an cmd queue issue when resetting
If a IMP reset caused by some hardware errors and hns RoCE driver reset
occurred at the same time, there is a possiblity that the IMP will stop
dealing with command and users can't use the hardware. The logs are as
follows:
hns3 0000:fd:00.1: cleaned 0, need to clean 1
hns3 0000:fd:00.1: firmware version query failed -11
hns3 0000:fd:00.1: Cmd queue init failed
hns3 0000:fd:00.1: Upgrade reset level
hns3 0000:fd:00.1: global reset interrupt
The hns NIC driver divides the reset process into 3 status:
initialization, hardware resetting and softwaring restting. RoCE driver
gets reset status by interfaces provided by NIC driver and commands will
not be sent to the IMP if the driver is in any above status. The main
reason for this issue is that there is a time gap between status 1 and 2,
if the RoCE driver sends commands to the IMP during this gap, the IMP will
stop working because it is not ready.
To eliminate the time gap, the hns NIC driver has added a new interface in
commit a4de02287abb9 ("net: hns3: provide .get_cmdq_stat interface for the
client"), so RoCE driver can ensure that no commands will be sent during
resetting.
Yangyang Li [Tue, 16 Jun 2020 13:37:09 +0000 (21:37 +0800)]
RDMA/hns: Fix a calltrace when registering MR from userspace
ibmr.device is assigned after MR is successfully registered, but both
write_mtpt() and frmr_write_mtpt() accesses it during the mr registration
process, which may cause the following error when trying to register MR in
userspace and pbl_hop_num is set to 0.
Solve above issue by adding a pointer of structure hns_roce_dev as a
parameter of write_mtpt() and frmr_write_mtpt(), so that both of these
functions can access it before finishing MR's registration.
Tiezhu Yang [Thu, 18 Jun 2020 02:06:01 +0000 (10:06 +0800)]
perf build: Fix error message when asking for -fsanitize=address without required libraries
When build perf with ASan or UBSan, if libasan or libubsan can not find,
the feature-glibc is 0 and there exists the following error log which is
wrong, because we can find gnu/libc-version.h in /usr/include,
glibc-devel is also installed.
[yangtiezhu@linux perf]$ make DEBUG=1 EXTRA_CFLAGS='-fno-omit-frame-pointer -fsanitize=address'
BUILD: Doing 'make -j4' parallel build
HOSTCC fixdep.o
HOSTLD fixdep-in.o
LINK fixdep
<stdin>:1:0: warning: -fsanitize=address and -fsanitize=kernel-address are not supported for this target
<stdin>:1:0: warning: -fsanitize=address not supported for this target
Auto-detecting system features:
... dwarf: [ OFF ]
... dwarf_getlocations: [ OFF ]
... glibc: [ OFF ]
... gtk2: [ OFF ]
... libaudit: [ OFF ]
... libbfd: [ OFF ]
... libcap: [ OFF ]
... libelf: [ OFF ]
... libnuma: [ OFF ]
... numa_num_possible_cpus: [ OFF ]
... libperl: [ OFF ]
... libpython: [ OFF ]
... libcrypto: [ OFF ]
... libunwind: [ OFF ]
... libdw-dwarf-unwind: [ OFF ]
... zlib: [ OFF ]
... lzma: [ OFF ]
... get_cpuid: [ OFF ]
... bpf: [ OFF ]
... libaio: [ OFF ]
... libzstd: [ OFF ]
... disassembler-four-args: [ OFF ]
Makefile.config:393: *** No gnu/libc-version.h found, please install glibc-dev[el]. Stop.
Makefile.perf:224: recipe for target 'sub-make' failed
make[1]: *** [sub-make] Error 2
Makefile:69: recipe for target 'all' failed
make: *** [all] Error 2
[yangtiezhu@linux perf]$ ls /usr/include/gnu/libc-version.h
/usr/include/gnu/libc-version.h
After install libasan and libubsan, the feature-glibc is 1 and the build
process is success, so the cause is related with libasan or libubsan, we
should check them and print an error log to reflect the reality.
Auto-detecting system features:
... dwarf: [ OFF ]
... dwarf_getlocations: [ OFF ]
... glibc: [ OFF ]
... gtk2: [ OFF ]
... libbfd: [ OFF ]
... libcap: [ OFF ]
... libelf: [ OFF ]
... libnuma: [ OFF ]
... numa_num_possible_cpus: [ OFF ]
... libperl: [ OFF ]
... libpython: [ OFF ]
... libcrypto: [ OFF ]
... libunwind: [ OFF ]
... libdw-dwarf-unwind: [ OFF ]
... zlib: [ OFF ]
... lzma: [ OFF ]
... get_cpuid: [ OFF ]
... bpf: [ OFF ]
... libaio: [ OFF ]
... libzstd: [ OFF ]
... disassembler-four-args: [ OFF ]
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... gtk2: [ on ]
... libbfd: [ on ]
... libcap: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libcrypto: [ on ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
... disassembler-four-args: [ on ]
GEN /tmp/build/perf/common-cmds.h
CC /tmp/build/perf/exec-cmd.o
<SNIP>
INSTALL perf_completion-script
INSTALL perf-tip
make: Leaving directory '/home/acme/git/perf/tools/perf'
$ ldd ~/bin/perf | grep asan
$
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: tiezhu yang <yangtiezhu@loongson.cn> Cc: xuefeng li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1592445961-28044-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Steven Rostedt (VMware) [Tue, 24 Mar 2020 20:08:48 +0000 (16:08 -0400)]
tools lib traceevent: Add handler for __builtin_expect()
In order to move pointer checks like IS_ERR_VALUE() out of the hotpath
and into the reader path of a trace event, user space tools need to be
able to parse that. IS_ERR_VALUE() is defined as:
Steven Rostedt (VMware) [Tue, 24 Mar 2020 20:08:47 +0000 (16:08 -0400)]
tools lib traceevent: Handle __attribute__((user)) in field names
Commit c61f13eaa1ee1 ("gcc-plugins: Add structleak for more stack
initialization") added "__attribute__((user))" to the user when
stackleak detector is enabled. This now appears in the field format of
system call trace events for system calls that have user buffers. The
"__attribute__((user))" breaks the parsing in libtraceevent. That needs
to be handled.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jaewon Kim <jaewon31.kim@samsung.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Kook <keescook@chromium.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-mm@kvack.org Cc: linux-trace-devel@vger.kernel.org Link: http://lore.kernel.org/lkml/20200324200956.663647256@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Steven Rostedt (VMware) [Tue, 24 Mar 2020 20:08:46 +0000 (16:08 -0400)]
tools lib traceevent: Add append() function helper for appending strings
There's several locations that open code realloc and strcat() to append
text to strings. Add an append() function that takes a delimiter and a
string to append to another string.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jaewon Lim <jaewon31.kim@samsung.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Kook <keescook@chromium.org> Cc: linux-mm@kvack.org Cc: linux-trace-devel@vger.kernel.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lore.kernel.org/lkml/20200324200956.515118403@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM") Link: https://lore.kernel.org/r/20200616104304.2426081-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Michal Kalderon [Tue, 16 Jun 2020 09:34:08 +0000 (12:34 +0300)]
RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532
Private data passed to iwarp_cm_handler is copied for connection request /
response, but ignored otherwise. If junk is passed, it is stored in the
event and used later in the event processing.
The driver passes an old junk pointer during connection close which leads
to a use-after-free on event processing. Set private data to NULL for
events that don 't have private data.
Gal Pressman [Sun, 14 Jun 2020 10:35:34 +0000 (13:35 +0300)]
RDMA/efa: Set maximum pkeys device attribute
The max_pkeys device attribute was not set in query device verb, set it to
one in order to account for the default pkey (0xffff). This information is
exposed to userspace and can cause malfunction
Aditya Pakki [Sun, 14 Jun 2020 04:11:48 +0000 (23:11 -0500)]
RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq
In case of failure of alloc_ud_wq_attr(), the memory allocated by
rvt_alloc_rq() is not freed. Fix it by calling rvt_free_rq() using the
existing clean-up code.
Fixes: d310c4bf8aea ("IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs") Link: https://lore.kernel.org/r/20200614041148.131983-1-pakki001@umn.edu Signed-off-by: Aditya Pakki <pakki001@umn.edu> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Will Deacon [Fri, 29 May 2020 13:12:18 +0000 (14:12 +0100)]
arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints
Unprivileged memory accesses generated by the so-called "translated"
instructions (e.g. STTR) at EL1 can cause EL0 watchpoints to fire
unexpectedly if kernel debugging is enabled. In such cases, the
hw_breakpoint logic will invoke the user overflow handler which will
typically raise a SIGTRAP back to the current task. This is futile when
returning back to the kernel because (a) the signal won't have been
delivered and (b) userspace can't handle the thing anyway.
Avoid invoking the user overflow handler for watchpoints triggered by
kernel uaccess routines, and instead single-step over the faulting
instruction as we would if no overflow handler had been installed.
(Fixes tag identifies the introduction of unprivileged memory accesses,
which exposed this latent bug in the hw_breakpoint code)
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Fixes: 57f4959bad0a ("arm64: kernel: Add support for User Access Override") Reported-by: Luis Machado <luis.machado@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
Weiping Zhang [Wed, 17 Jun 2020 06:18:37 +0000 (14:18 +0800)]
block: update hctx map when use multiple maps
There is an issue when tune the number for read and write queues,
if the total queue count was not changed. The hctx->type cannot
be updated, since __blk_mq_update_nr_hw_queues will return directly
if the total queue count has not been changed.
Shannon Nelson [Tue, 16 Jun 2020 15:06:26 +0000 (08:06 -0700)]
ionic: export features for vlans to use
Set up vlan_features for use by any vlans above us.
Fixes: beead698b173 ("ionic: Add the basic NDO callbacks for netdev support") Signed-off-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 16 Jun 2020 01:14:59 +0000 (18:14 -0700)]
ionic: no link check while resetting queues
If the driver is busy resetting queues after a change in
MTU or queue parameters, don't bother checking the link,
wait until the next watchdog cycle.
Fixes: 987c0871e8ae ("ionic: check for linkup in watchdog") Signed-off-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Wed, 17 Jun 2020 14:46:33 +0000 (15:46 +0100)]
rxrpc: Fix afs large storage transmission performance drop
Commit 2ad6691d988c, which moved the modification of the status annotation
for a packet in the Tx buffer prior to the retransmission moved the state
clearance, but managed to lose the bit that set it to UNACK.
Consequently, if a retransmission occurs, the packet is accidentally
changed to the ACK state (ie. 0) by masking it off, which means that the
packet isn't counted towards the tally of newly-ACK'd packets if it gets
hard-ACK'd. This then prevents the congestion control algorithm from
recovering properly.
Fix by reinstating the change of state to UNACK.
Spotted by the generic/460 xfstest.
Fixes: 2ad6691d988c ("rxrpc: Fix race between incoming ACK parser and retransmitter") Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 17 Jun 2020 22:01:23 +0000 (23:01 +0100)]
rxrpc: Fix handling of rwind from an ACK packet
The handling of the receive window size (rwind) from a received ACK packet
is not correct. The rxrpc_input_ackinfo() function currently checks the
current Tx window size against the rwind from the ACK to see if it has
changed, but then limits the rwind size before storing it in the tx_winsize
member and, if it increased, wake up the transmitting process. This means
that if rwind > RXRPC_RXTX_BUFF_SIZE - 1, this path will always be
followed.
Fix this by limiting rwind before we compare it to tx_winsize.
The effect of this can be seen by enabling the rxrpc_rx_rwind_change
tracepoint.
Fixes: 702f2ac87a9a ("rxrpc: Wake up the transmitter if Rx window size increases on the peer") Signed-off-by: David Howells <dhowells@redhat.com>
Those last two bytes - 96 1f - aren't part of the original packet.
In the ax88179 RX path, the usbnet rx_fixup function trims a 2-byte
'alignment pseudo header' from the start of the packet, and sets the
length from a per-packet field populated by hardware. It looks like that
length field *includes* the 2-byte header; the current driver assumes
that it's excluded.
This change trims the 2-byte alignment header after we've set the packet
length, so the resulting packet length is correct. While we're moving
the comment around, this also fixes the spelling of 'pseudo'.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenz Brun [Thu, 11 Jun 2020 20:11:21 +0000 (22:11 +0200)]
drm/amdkfd: Use correct major in devcgroup check
The existing code used the major version number of the DRM driver
instead of the device major number of the DRM subsystem for
validating access for a devices cgroup.
This meant that accesses allowed by the devices cgroup weren't
permitted and certain accesses denied by the devices cgroup were
permitted (if they matched the wrong major device number).
Signed-off-by: Lorenz Brun <lorenz@brun.one> Fixes: 6b855f7b83d2f ("drm/amdkfd: Check against device cgroup") Reviewed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Tom Rix [Wed, 17 Jun 2020 12:40:28 +0000 (05:40 -0700)]
selinux: fix undefined return of cond_evaluate_expr
clang static analysis reports an undefined return
security/selinux/ss/conditional.c:79:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
return s[0];
^~~~~~~~~~~
static int cond_evaluate_expr( ...
{
u32 i;
int s[COND_EXPR_MAXDEPTH];
for (i = 0; i < expr->len; i++)
...
return s[0];
When expr->len is 0, the loop which sets s[0] never runs.
So return -1 if the loop never runs.
Cc: stable@vger.kernel.org Signed-off-by: Tom Rix <trix@redhat.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Kaitao Cheng [Fri, 29 May 2020 14:12:14 +0000 (22:12 +0800)]
ftrace: Fix maybe-uninitialized compiler warning
During build compiler reports some 'false positive' warnings about
variables {'seq_ops', 'filtered_pids', 'other_pids'} may be used
uninitialized. This patch silences these warnings.
Also delete some useless spaces
Vishal Verma [Wed, 20 May 2020 22:50:26 +0000 (16:50 -0600)]
nvdimm/region: always show the 'align' attribute
It is possible that a platform that is capable of 'namespace labels'
comes up without the labels properly initialized. In this case, the
region's 'align' attribute is hidden. Howerver, once the user does
initialize he labels, the 'align' attribute still stays hidden, which is
unexpected.
The sysfs_update_group() API is meant to address this, and could be
called during region probe, but it has entanglements with the device
'lockdep_mutex'. Therefore, simply make the 'align' attribute always
visible. It doesn't matter what it says for label-less namespaces, since
it is not possible to change their allocation anyway.
Jens Axboe [Wed, 17 Jun 2020 21:00:04 +0000 (15:00 -0600)]
io_uring: reap poll completions while waiting for refs to drop on exit
If we're doing polled IO and end up having requests being submitted
async, then completions can come in while we're waiting for refs to
drop. We need to reap these manually, as nobody else will be looking
for them.
Break the wait into 1/20th of a second time waits, and check for done
poll completions if we time out. Otherwise we can have done poll
completions sitting in ctx->poll_list, which needs us to reap them but
we're just waiting for them.
Dmitry V. Levin [Tue, 2 Jun 2020 18:00:51 +0000 (21:00 +0300)]
s390: fix syscall_get_error for compat processes
If both the tracer and the tracee are compat processes, and gprs[2]
is assigned a value by __poke_user_compat, then the higher 32 bits
of gprs[2] are cleared, IS_ERR_VALUE() always returns false, and
syscall_get_error() always returns 0.
Fix the implementation by sign-extending the value for compat processes
the same way as x86 implementation does.
The bug was exposed to user space by commit 201766a20e30f ("ptrace: add
PTRACE_GET_SYSCALL_INFO request") and detected by strace test suite.
This change fixes strace syscall tampering on s390.