]> www.infradead.org Git - users/willy/linux.git/log
users/willy/linux.git
4 years agotipc: fix size validations for the MSG_CRYPTO type
Max VA [Mon, 25 Oct 2021 15:31:53 +0000 (17:31 +0200)]
tipc: fix size validations for the MSG_CRYPTO type

The function tipc_crypto_key_rcv is used to parse MSG_CRYPTO messages
to receive keys from other nodes in the cluster in order to decrypt any
further messages from them.
This patch verifies that any supplied sizes in the message body are
valid for the received message.

Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange")
Signed-off-by: Max VA <maxv@sentinelone.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: port100: fix using -ERRNO as command type mask
Krzysztof Kozlowski [Mon, 25 Oct 2021 14:49:36 +0000 (16:49 +0200)]
nfc: port100: fix using -ERRNO as command type mask

During probing, the driver tries to get a list (mask) of supported
command types in port100_get_command_type_mask() function.  The value
is u64 and 0 is treated as invalid mask (no commands supported).  The
function however returns also -ERRNO as u64 which will be interpret as
valid command mask.

Return 0 on every error case of port100_get_command_type_mask(), so the
probing will stop.

Cc: <stable@vger.kernel.org>
Fixes: 0347a6ab300a ("NFC: port100: Commands mechanism implementation")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
David S. Miller [Tue, 26 Oct 2021 12:26:09 +0000 (13:26 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-10-25

This series contains updates to ice driver only.

Dave adds event handler for LAG NETDEV_UNREGISTER to unlink device from
link aggregate.

Yongxin Liu adds a check for PTP support during release which would
cause a call trace on non-PTP supported devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: multicast: calculate csum of looped-back and forwarded packets
Cyril Strejc [Sun, 24 Oct 2021 20:14:25 +0000 (22:14 +0200)]
net: multicast: calculate csum of looped-back and forwarded packets

During a testing of an user-space application which transmits UDP
multicast datagrams and utilizes multicast routing to send the UDP
datagrams out of defined network interfaces, I've found a multicast
router does not fill-in UDP checksum into locally produced, looped-back
and forwarded UDP datagrams, if an original output NIC the datagrams
are sent to has UDP TX checksum offload enabled.

The datagrams are sent malformed out of the NIC the datagrams have been
forwarded to.

It is because:

1. If TX checksum offload is enabled on the output NIC, UDP checksum
   is not calculated by kernel and is not filled into skb data.

2. dev_loopback_xmit(), which is called solely by
   ip_mc_finish_output(), sets skb->ip_summed = CHECKSUM_UNNECESSARY
   unconditionally.

3. Since 35fc92a9 ("[NET]: Allow forwarding of ip_summed except
   CHECKSUM_COMPLETE"), the ip_summed value is preserved during
   forwarding.

4. If ip_summed != CHECKSUM_PARTIAL, checksum is not calculated during
   a packet egress.

The minimum fix in dev_loopback_xmit():

1. Preserves skb->ip_summed CHECKSUM_PARTIAL. This is the
   case when the original output NIC has TX checksum offload enabled.
   The effects are:

     a) If the forwarding destination interface supports TX checksum
        offloading, the NIC driver is responsible to fill-in the
        checksum.

     b) If the forwarding destination interface does NOT support TX
        checksum offloading, checksums are filled-in by kernel before
        skb is submitted to the NIC driver.

     c) For local delivery, checksum validation is skipped as in the
        case of CHECKSUM_UNNECESSARY, thanks to skb_csum_unnecessary().

2. Translates ip_summed CHECKSUM_NONE to CHECKSUM_UNNECESSARY. It
   means, for CHECKSUM_NONE, the behavior is unmodified and is there
   to skip a looped-back packet local delivery checksum validation.

Signed-off-by: Cyril Strejc <cyril.strejc@skoda.cz>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: pci: Recycle received packet upon allocation failure
Ido Schimmel [Sun, 24 Oct 2021 06:40:14 +0000 (09:40 +0300)]
mlxsw: pci: Recycle received packet upon allocation failure

When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.

After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.

Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.

Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.

The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.

[1]
 # echo 10 > /sys/kernel/debug/failslab/times
 # echo 1000 > /sys/kernel/debug/failslab/interval
 # echo 100 > /sys/kernel/debug/failslab/probability

 FAULT_INJECTION: forcing a failure.
 name failslab, interval 1000, probability 100, space 0, times 8
 [...]
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x34/0x44
  should_fail.cold+0x32/0x37
  should_failslab+0x5/0x10
  kmem_cache_alloc_node+0x23/0x190
  __alloc_skb+0x1f9/0x280
  __netdev_alloc_skb+0x3a/0x150
  mlxsw_pci_rdq_skb_alloc+0x24/0x90
  mlxsw_pci_cq_tasklet+0x3dc/0x1200
  tasklet_action_common.constprop.0+0x9f/0x100
  __do_softirq+0xb5/0x252
  irq_exit_rcu+0x7a/0xa0
  common_interrupt+0x83/0xa0
  </IRQ>
  asm_common_interrupt+0x1e/0x40
 RIP: 0010:cpuidle_enter_state+0xc8/0x340
 [...]
 mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ

Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoice: check whether PTP is initialized in ice_ptp_release()
Yongxin Liu [Mon, 11 Oct 2021 07:02:16 +0000 (15:02 +0800)]
ice: check whether PTP is initialized in ice_ptp_release()

PTP is currently only supported on E810 devices, it is checked
in ice_ptp_init(). However, there is no check in ice_ptp_release().
For other E800 series devices, ice_ptp_release() will be wrongly executed.

Fix the following calltrace.

  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  Workqueue: ice ice_service_task [ice]
  Call Trace:
   dump_stack_lvl+0x5b/0x82
   dump_stack+0x10/0x12
   register_lock_class+0x495/0x4a0
   ? find_held_lock+0x3c/0xb0
   __lock_acquire+0x71/0x1830
   lock_acquire+0x1e6/0x330
   ? ice_ptp_release+0x3c/0x1e0 [ice]
   ? _raw_spin_lock+0x19/0x70
   ? ice_ptp_release+0x3c/0x1e0 [ice]
   _raw_spin_lock+0x38/0x70
   ? ice_ptp_release+0x3c/0x1e0 [ice]
   ice_ptp_release+0x3c/0x1e0 [ice]
   ice_prepare_for_reset+0xcb/0xe0 [ice]
   ice_do_reset+0x38/0x110 [ice]
   ice_service_task+0x138/0xf10 [ice]
   ? __this_cpu_preempt_check+0x13/0x20
   process_one_work+0x26a/0x650
   worker_thread+0x3f/0x3b0
   ? __kthread_parkme+0x51/0xb0
   ? process_one_work+0x650/0x650
   kthread+0x161/0x190
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x1f/0x30

Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoice: Respond to a NETDEV_UNREGISTER event for LAG
Dave Ertman [Thu, 7 Oct 2021 15:40:31 +0000 (08:40 -0700)]
ice: Respond to a NETDEV_UNREGISTER event for LAG

When the PF is a member of a link aggregate, and the driver
is removed, the process will hang unless we respond to the
NETDEV_UNREGISTER event that is sent to the event_handler
for LAG.

Add a case statement for the ice_lag_event_handler to unlink
the PF from the link aggregate.

Also remove code that was incorrectly applying a dev_hold to
peer_netdevs that were associated with the ice driver.

Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agonet-sysfs: initialize uid and gid before calling net_ns_get_ownership
Xin Long [Mon, 25 Oct 2021 06:31:48 +0000 (02:31 -0400)]
net-sysfs: initialize uid and gid before calling net_ns_get_ownership

Currently in net_ns_get_ownership() it may not be able to set uid or gid
if make_kuid or make_kgid returns an invalid value, and an uninit-value
issue can be triggered by this.

This patch is to fix it by initializing the uid and gid before calling
net_ns_get_ownership(), as it does in kobject_get_ownership()

Fixes: e6dee9f3893c ("net-sysfs: add netdev_change_owner()")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoxen/netfront: stop tx queues during live migration
Dongli Zhang [Fri, 22 Oct 2021 23:31:39 +0000 (16:31 -0700)]
xen/netfront: stop tx queues during live migration

The tx queues are not stopped during the live migration. As a result, the
ndo_start_xmit() may access netfront_info->queues which is freed by
talk_to_netback()->xennet_destroy_queues().

This patch is to netif_device_detach() at the beginning of xen-netfront
resuming, and netif_device_attach() at the end of resuming.

     CPU A                                CPU B

 talk_to_netback()
 -> if (info->queues)
        xennet_destroy_queues(info);
    to free netfront_info->queues

                                        xennet_start_xmit()
                                        to access netfront_info->queues

  -> err = xennet_create_queues(info, &num_queues);

The idea is borrowed from virtio-net.

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: Prevent infinite while loop in skb_tx_hash()
Michael Chan [Mon, 25 Oct 2021 09:05:28 +0000 (05:05 -0400)]
net: Prevent infinite while loop in skb_tx_hash()

Drivers call netdev_set_num_tc() and then netdev_set_tc_queue()
to set the queue count and offset for each TC.  So the queue count
and offset for the TCs may be zero for a short period after dev->num_tc
has been set.  If a TX packet is being transmitted at this time in the
code path netdev_pick_tx() -> skb_tx_hash(), skb_tx_hash() may see
nonzero dev->num_tc but zero qcount for the TC.  The while loop that
keeps looping while hash >= qcount will not end.

Fix it by checking the TC's qcount to be nonzero before using it.

Fixes: eadec877ce9c ("net: Add support for subordinate traffic classes to netdev_pick_tx")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: nxp: lpc_eth.c: avoid hang when bringing interface down
Trevor Woerner [Sun, 24 Oct 2021 17:50:02 +0000 (13:50 -0400)]
net: nxp: lpc_eth.c: avoid hang when bringing interface down

A hard hang is observed whenever the ethernet interface is brought
down. If the PHY is stopped before the LPC core block is reset,
the SoC will hang. Comparing lpc_eth_close() and lpc_eth_open() I
re-arranged the ordering of the functions calls in lpc_eth_close() to
reset the hardware before stopping the PHY.
Fixes: b7370112f519 ("lpc32xx: Added ethernet driver")
Signed-off-by: Trevor Woerner <twoerner@gmail.com>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ksettings-locking-fixes'
David S. Miller [Mon, 25 Oct 2021 13:06:43 +0000 (14:06 +0100)]
Merge branch 'ksettings-locking-fixes'

Andrew Lunn says:

====================
ksettings_{get|set} lock fixes

Walter Stoll <Walter.Stoll@duagon.com> reported a race condition
between "ethtool -s eth0 speed 100 duplex full autoneg off" and phylib
reading the current status from the PHY. Both ksetting_get and
ksetting_set fail the take the phydev mutex, and as a result, there is
a small window of time where the phydev members are not self
consistent.

Patch 1 fixes phy_ethtool_ksettings_get by adding the needed lock.
Patches 2 and 3 move code around and perform to refactoring, to allow
patch 4 to fix phy_ethtool_ksettings_set by added the lock.

Thanks go to Walter for the detailed origional report, suggested fix,
and testing of the proposed patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agophy: phy_ethtool_ksettings_set: Lock the PHY while changing settings
Andrew Lunn [Sun, 24 Oct 2021 19:48:05 +0000 (21:48 +0200)]
phy: phy_ethtool_ksettings_set: Lock the PHY while changing settings

There is a race condition where the PHY state machine can change
members of the phydev structure at the same time userspace requests a
change via ethtool. To prevent this, have phy_ethtool_ksettings_set
take the PHY lock.

Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Reported-by: Walter Stoll <Walter.Stoll@duagon.com>
Suggested-by: Walter Stoll <Walter.Stoll@duagon.com>
Tested-by: Walter Stoll <Walter.Stoll@duagon.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agophy: phy_start_aneg: Add an unlocked version
Andrew Lunn [Sun, 24 Oct 2021 19:48:04 +0000 (21:48 +0200)]
phy: phy_start_aneg: Add an unlocked version

Split phy_start_aneg into a wrapper which takes the PHY lock, and a
helper doing the real work. This will be needed when
phy_ethtook_ksettings_set takes the lock.

Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agophy: phy_ethtool_ksettings_set: Move after phy_start_aneg
Andrew Lunn [Sun, 24 Oct 2021 19:48:03 +0000 (21:48 +0200)]
phy: phy_ethtool_ksettings_set: Move after phy_start_aneg

This allows it to make use of a helper which assume the PHY is already
locked.

Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agophy: phy_ethtool_ksettings_get: Lock the phy for consistency
Andrew Lunn [Sun, 24 Oct 2021 19:48:02 +0000 (21:48 +0200)]
phy: phy_ethtool_ksettings_get: Lock the phy for consistency

The PHY structure should be locked while copying information out if
it, otherwise there is no guarantee of self consistency. Without the
lock the PHY state machine could be updating the structure.

Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: microchip: lan743x: Fix dma allocation failure by using dma_set_mask_a...
Yuiko Oshino [Fri, 22 Oct 2021 15:53:43 +0000 (11:53 -0400)]
net: ethernet: microchip: lan743x: Fix dma allocation failure by using dma_set_mask_and_coherent

The dma failure was reported in the raspberry pi github (issue #4117).
https://github.com/raspberrypi/linux/issues/4117
The use of dma_set_mask_and_coherent fixes the issue.
Tested on 32/64-bit raspberry pi CM4 and 64-bit ubuntu x86 PC with EVB-LAN7430.

Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: microchip: lan743x: Fix driver crash when lan743x_pm_resume fails
Yuiko Oshino [Fri, 22 Oct 2021 15:13:53 +0000 (11:13 -0400)]
net: ethernet: microchip: lan743x: Fix driver crash when lan743x_pm_resume fails

The driver needs to clean up and return when the initialization fails on resume.

Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agofcnal-test: kill hanging ping/nettest binaries on cleanup
Florian Westphal [Thu, 21 Oct 2021 14:02:47 +0000 (16:02 +0200)]
fcnal-test: kill hanging ping/nettest binaries on cleanup

On my box I see a bunch of ping/nettest processes hanging
around after fcntal-test.sh is done.

Clean those up before netns deletion.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20211021140247.29691-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge branch 'sctp-enhancements-for-the-verification-tag'
Jakub Kicinski [Fri, 22 Oct 2021 19:36:47 +0000 (12:36 -0700)]
Merge branch 'sctp-enhancements-for-the-verification-tag'

Xin Long says:

====================
sctp: enhancements for the verification tag

This patchset is to address CVE-2021-3772:

  A flaw was found in the Linux SCTP stack. A blind attacker may be able to
  kill an existing SCTP association through invalid chunks if the attacker
  knows the IP-addresses and port numbers being used and the attacker can
  send packets with spoofed IP addresses.

This is caused by the missing VTAG verification for the received chunks
and the incorrect vtag for the ABORT used to reply to these invalid
chunks.

This patchset is to go over all processing functions for the received
chunks and do:

1. Make sure sctp_vtag_verify() is called firstly to verify the vtag from
   the received chunk and discard this chunk if it fails. With some
   exceptions:

   a. sctp_sf_do_5_1B_init()/5_2_2_dupinit()/9_2_reshutack(), processing
      INIT chunk, as sctphdr vtag is always 0 in INIT chunk.

   b. sctp_sf_do_5_2_4_dupcook(), processing dupicate COOKIE_ECHO chunk,
      as the vtag verification will be done by sctp_tietags_compare() and
      then it takes right actions according to the return.

   c. sctp_sf_shut_8_4_5(), processing SHUTDOWN_ACK chunk for cookie_wait
      and cookie_echoed state, as RFC demand sending a SHUTDOWN_COMPLETE
      even if the vtag verification failed.

   d. sctp_sf_ootb(), called in many types of chunks for closed state or
      no asoc, as the same reason to c.

2. Always use the vtag from the received INIT chunk to make the response
   ABORT in sctp_ootb_pkt_new().

3. Fix the order for some checks and add some missing checks for the
   received chunk.

This patch series has been tested with SCTP TAHI testing to make sure no
regression caused on protocol conformance.
====================

Link: https://lore.kernel.org/r/cover.1634730082.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agosctp: add vtag check in sctp_sf_ootb
Xin Long [Wed, 20 Oct 2021 11:42:47 +0000 (07:42 -0400)]
sctp: add vtag check in sctp_sf_ootb

sctp_sf_ootb() is called when processing DATA chunk in closed state,
and many other places are also using it.

The vtag in the chunk's sctphdr should be verified, otherwise, as
later in chunk length check, it may send abort with the existent
asoc's vtag, which can be exploited by one to cook a malicious
chunk to terminate a SCTP asoc.

When fails to verify the vtag from the chunk, this patch sets asoc
to NULL, so that the abort will be made with the vtag from the
received chunk later.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agosctp: add vtag check in sctp_sf_do_8_5_1_E_sa
Xin Long [Wed, 20 Oct 2021 11:42:46 +0000 (07:42 -0400)]
sctp: add vtag check in sctp_sf_do_8_5_1_E_sa

sctp_sf_do_8_5_1_E_sa() is called when processing SHUTDOWN_ACK chunk
in cookie_wait and cookie_echoed state.

The vtag in the chunk's sctphdr should be verified, otherwise, as
later in chunk length check, it may send abort with the existent
asoc's vtag, which can be exploited by one to cook a malicious
chunk to terminate a SCTP asoc.

Note that when fails to verify the vtag from SHUTDOWN-ACK chunk,
SHUTDOWN COMPLETE message will still be sent back to peer, but
with the vtag from SHUTDOWN-ACK chunk, as said in 5) of
rfc4960#section-8.4.

While at it, also remove the unnecessary chunk length check from
sctp_sf_shut_8_4_5(), as it's already done in both places where
it calls sctp_sf_shut_8_4_5().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agosctp: add vtag check in sctp_sf_violation
Xin Long [Wed, 20 Oct 2021 11:42:45 +0000 (07:42 -0400)]
sctp: add vtag check in sctp_sf_violation

sctp_sf_violation() is called when processing HEARTBEAT_ACK chunk
in cookie_wait state, and some other places are also using it.

The vtag in the chunk's sctphdr should be verified, otherwise, as
later in chunk length check, it may send abort with the existent
asoc's vtag, which can be exploited by one to cook a malicious
chunk to terminate a SCTP asoc.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agosctp: fix the processing for COOKIE_ECHO chunk
Xin Long [Wed, 20 Oct 2021 11:42:44 +0000 (07:42 -0400)]
sctp: fix the processing for COOKIE_ECHO chunk

1. In closed state: in sctp_sf_do_5_1D_ce():

  When asoc is NULL, making packet for abort will use chunk's vtag
  in sctp_ootb_pkt_new(). But when asoc exists, vtag from the chunk
  should be verified before using peer.i.init_tag to make packet
  for abort in sctp_ootb_pkt_new(), and just discard it if vtag is
  not correct.

2. In the other states: in sctp_sf_do_5_2_4_dupcook():

  asoc always exists, but duplicate cookie_echo's vtag will be
  handled by sctp_tietags_compare() and then take actions, so before
  that we only verify the vtag for the abort sent for invalid chunk
  length.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agosctp: fix the processing for INIT_ACK chunk
Xin Long [Wed, 20 Oct 2021 11:42:43 +0000 (07:42 -0400)]
sctp: fix the processing for INIT_ACK chunk

Currently INIT_ACK chunk in non-cookie_echoed state is processed in
sctp_sf_discard_chunk() to send an abort with the existent asoc's
vtag if the chunk length is not valid. But the vtag in the chunk's
sctphdr is not verified, which may be exploited by one to cook a
malicious chunk to terminal a SCTP asoc.

sctp_sf_discard_chunk() also is called in many other places to send
an abort, and most of those have this problem. This patch is to fix
it by sending abort with the existent asoc's vtag only if the vtag
from the chunk's sctphdr is verified in sctp_sf_discard_chunk().

Note on sctp_sf_do_9_1_abort() and sctp_sf_shutdown_pending_abort(),
the chunk length has been verified before sctp_sf_discard_chunk(),
so replace it with sctp_sf_discard(). On sctp_sf_do_asconf_ack() and
sctp_sf_do_asconf(), move the sctp_chunk_length_valid check ahead of
sctp_sf_discard_chunk(), then replace it with sctp_sf_discard().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agosctp: fix the processing for INIT chunk
Xin Long [Wed, 20 Oct 2021 11:42:42 +0000 (07:42 -0400)]
sctp: fix the processing for INIT chunk

This patch fixes the problems below:

1. In non-shutdown_ack_sent states: in sctp_sf_do_5_1B_init() and
   sctp_sf_do_5_2_2_dupinit():

  chunk length check should be done before any checks that may cause
  to send abort, as making packet for abort will access the init_tag
  from init_hdr in sctp_ootb_pkt_new().

2. In shutdown_ack_sent state: in sctp_sf_do_9_2_reshutack():

  The same checks as does in sctp_sf_do_5_2_2_dupinit() is needed
  for sctp_sf_do_9_2_reshutack().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agosctp: use init_tag from inithdr for ABORT chunk
Xin Long [Wed, 20 Oct 2021 11:42:41 +0000 (07:42 -0400)]
sctp: use init_tag from inithdr for ABORT chunk

Currently Linux SCTP uses the verification tag of the existing SCTP
asoc when failing to process and sending the packet with the ABORT
chunk. This will result in the peer accepting the ABORT chunk and
removing the SCTP asoc. One could exploit this to terminate a SCTP
asoc.

This patch is to fix it by always using the initiate tag of the
received INIT chunk for the ABORT chunk to be sent.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoskb_expand_head() adjust skb->truesize incorrectly
Vasily Averin [Fri, 22 Oct 2021 10:28:37 +0000 (13:28 +0300)]
skb_expand_head() adjust skb->truesize incorrectly

Christoph Paasch reports [1] about incorrect skb->truesize
after skb_expand_head() call in ip6_xmit.
This may happen because of two reasons:
- skb_set_owner_w() for newly cloned skb is called too early,
before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
- pskb_expand_head() does not adjust truesize in (skb->sk) case.
In this case sk->sk_wmem_alloc should be adjusted too.

[1] https://lkml.org/lkml/2021/8/20/1082

Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/644330dd-477e-0462-83bf-9f514c41edd1@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge tag 'mac80211-for-net-2021-10-21' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 22 Oct 2021 18:12:45 +0000 (11:12 -0700)]
Merge tag 'mac80211-for-net-2021-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Two small fixes:
 * RCU misuse in scan processing in cfg80211
 * missing size check for HE data in mac80211 mesh

* tag 'mac80211-for-net-2021-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
  cfg80211: scan: fix RCU in cfg80211_add_nontrans_list()
  mac80211: mesh: fix HE operation element length check
====================

Link: https://lore.kernel.org/r/20211021154351.134297-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoMerge tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 22 Oct 2021 05:06:08 +0000 (19:06 -1000)]
Merge tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Nothing too crazy at the end of the cycle, the kmb modesetting fixes
  are probably a bit large but it's not a major driver, and its fixing
  monitor doesn't turn on type problems.

  Otherwise it's just a few minor patches, one ast regression revert, an
  msm power stability fix.

  ast:
   - fix regression with connector detect

  msm:
   - fix power stability issue

  msxfb:
   - fix crash on unload

  panel:
   - sync fix

  kmb:
   - modesetting fixes"

* tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm:
  Revert "drm/ast: Add detect function support"
  drm/kmb: Enable ADV bridge after modeset
  drm/kmb: Corrected typo in handle_lcd_irq
  drm/kmb: Disable change of plane parameters
  drm/kmb: Remove clearing DPHY regs
  drm/kmb: Limit supported mode to 1080p
  drm/kmb: Work around for higher system clock
  drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel
  drm: mxsfb: Fix NULL pointer dereference crash on unload
  drm/msm/devfreq: Restrict idle clamping to a618 for now

4 years agomemblock: exclude MEMBLOCK_NOMAP regions from kmemleak
Mike Rapoport [Thu, 21 Oct 2021 07:09:29 +0000 (10:09 +0300)]
memblock: exclude MEMBLOCK_NOMAP regions from kmemleak

Vladimir Zapolskiy reports:

Commit a7259df76702 ("memblock: make memblock_find_in_range method
private") invokes a kernel panic while running kmemleak on OF platforms
with nomaped regions:

  Unable to handle kernel paging request at virtual address fff000021e00000
  [...]
    scan_block+0x64/0x170
    scan_gray_list+0xe8/0x17c
    kmemleak_scan+0x270/0x514
    kmemleak_write+0x34c/0x4ac

The memory allocated from memblock is registered with kmemleak, but if
it is marked MEMBLOCK_NOMAP it won't have linear map entries so an
attempt to scan such areas will fault.

Ideally, memblock_mark_nomap() would inform kmemleak to ignore
MEMBLOCK_NOMAP memory, but it can be called before kmemleak interfaces
operating on physical addresses can use __va() conversion.

Make sure that functions that mark allocated memory as MEMBLOCK_NOMAP
take care of informing kmemleak to ignore such memory.

Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
Link: https://lore.kernel.org/all/c30ff0a2-d196-c50d-22f0-bd50696b1205@quicinc.com
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Reported-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoRevert "memblock: exclude NOMAP regions from kmemleak"
Mike Rapoport [Thu, 21 Oct 2021 07:09:28 +0000 (10:09 +0300)]
Revert "memblock: exclude NOMAP regions from kmemleak"

Commit 6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak")
breaks boot on EFI systems with kmemleak and VM_DEBUG enabled:

  efi: Processing EFI memory map:
  efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
  efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
  ------------[ cut here ]------------
  kernel BUG at mm/kmemleak.c:1140!
  Internal error: Oops - BUG: 0 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc6-next-20211019+ #104
  pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : kmemleak_free_part_phys+0x64/0x8c
  lr : kmemleak_free_part_phys+0x38/0x8c
  sp : ffff800011eafbc0
  x29: ffff800011eafbc0 x28: 1fffff7fffb41c0d x27: fffffbfffda0e068
  x26: 0000000092000000 x25: 1ffff000023d5f94 x24: ffff800011ed84d0
  x23: ffff800011ed84c0 x22: ffff800011ed83d8 x21: 0000000000900000
  x20: ffff800011782000 x19: 0000000092000000 x18: ffff800011ee0730
  x17: 0000000000000000 x16: 0000000000000000 x15: 1ffff0000233252c
  x14: ffff800019a905a0 x13: 0000000000000001 x12: ffff7000023d5ed7
  x11: 1ffff000023d5ed6 x10: ffff7000023d5ed6 x9 : dfff800000000000
  x8 : ffff800011eaf6b7 x7 : 0000000000000001 x6 : ffff800011eaf6b0
  x5 : 00008ffffdc2a12a x4 : ffff7000023d5ed7 x3 : 1ffff000023dbf99
  x2 : 1ffff000022f0463 x1 : 0000000000000000 x0 : ffffffffffffffff
  Call trace:
   kmemleak_free_part_phys+0x64/0x8c
   memblock_mark_nomap+0x5c/0x78
   reserve_regions+0x294/0x33c
   efi_init+0x2d0/0x490
   setup_arch+0x80/0x138
   start_kernel+0xa0/0x3ec
   __primary_switched+0xc0/0xc8
  Code: 34000041 97d526e7 f9418e80 36000040 (d4210000)
  random: get_random_bytes called from print_oops_end_marker+0x34/0x80 with crng_init=0
  ---[ end trace 0000000000000000 ]---

The crash happens because kmemleak_free_part_phys() tries to use __va()
before memstart_addr is initialized and this triggers a VM_BUG_ON() in
arch/arm64/include/asm/memory.h:

Revert 6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak"),
the issue it is fixing will be fixed differently.

Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge branch 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 22 Oct 2021 03:27:17 +0000 (17:27 -1000)]
Merge branch 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ucounts fixes from Eric Biederman:
 "There has been one very hard to track down bug in the ucount code that
  we have been tracking since roughly v5.14 was released. Alex managed
  to find a reliable reproducer a few days ago and then I was able to
  instrument the code and figure out what the issue was.

  It turns out the sigqueue_alloc single atomic operation optimization
  did not play nicely with ucounts multiple level rlimits. It turned out
  that either sigqueue_alloc or sigqueue_free could be operating on
  multiple levels and trigger the conditions for the optimization on
  more than one level at the same time.

  To deal with that situation I have introduced inc_rlimit_get_ucounts
  and dec_rlimit_put_ucounts that just focuses on the optimization and
  the rlimit and ucount changes.

  While looking into the big bug I found I couple of other little issues
  so I am including those fixes here as well.

  When I have time I would very much like to dig into process ownership
  of the shared signal queue and see if we could pick a single owner for
  the entire queue so that all of the rlimits can count to that owner.
  That should entirely remove the need to call get_ucounts and
  put_ucounts in sigqueue_alloc and sigqueue_free. It is difficult
  because Linux unlike POSIX supports setuid that works on a single
  thread"

* 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring
  ucounts: Proper error handling in set_cred_ucounts
  ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds
  ucounts: Fix signal ucount refcounting

4 years agoMerge tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 22 Oct 2021 01:36:50 +0000 (15:36 -1000)]
Merge tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, and can.

  We'll have one more fix for a socket accounting regression, it's still
  getting polished. Otherwise things look fine.

  Current release - regressions:

   - revert "vrf: reset skb conntrack connection on VRF rcv", there are
     valid uses for previous behavior

   - can: m_can: fix iomap_read_fifo() and iomap_write_fifo()

  Current release - new code bugs:

   - mlx5: e-switch, return correct error code on group creation failure

  Previous releases - regressions:

   - sctp: fix transport encap_port update in sctp_vtag_verify

   - stmmac: fix E2E delay mechanism (in PTP timestamping)

  Previous releases - always broken:

   - netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr

   - netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of
     init

   - netfilter: ipvs: make global sysctl read-only in non-init netns

   - tcp: md5: fix selection between vrf and non-vrf keys

   - ipv6: count rx stats on the orig netdev when forwarding

   - bridge: mcast: use multicast_membership_interval for IGMPv3

   - can:
      - j1939: fix UAF for rx_kref of j1939_priv abort sessions on
        receiving bad messages

      - isotp: fix TX buffer concurrent access in isotp_sendmsg() fix
        return error on FC timeout on TX path

   - ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited

   - hns3: schedule the polling again when allocation fails, prevent
     stalls

   - drivers: add missing of_node_put() when aborting
     for_each_available_child_of_node()

   - ptp: fix possible memory leak and UAF in ptp_clock_register()

   - e1000e: fix packet loss in burst mode on Tiger Lake and later

   - mlx5e: ipsec: fix more checksum offload issues"

* tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
  usbnet: sanity check for maxpacket
  net: enetc: make sure all traffic classes can send large frames
  net: enetc: fix ethtool counter name for PM0_TERR
  ptp: free 'vclock_index' in ptp_clock_release()
  sfc: Don't use netif_info before net_device setup
  sfc: Export fibre-specific supported link modes
  net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags
  net/mlx5e: IPsec: Fix a misuse of the software parser's fields
  net/mlx5e: Fix vlan data lost during suspend flow
  net/mlx5: E-switch, Return correct error code on group creation failure
  net/mlx5: Lag, change multipath and bonding to be mutually exclusive
  ice: Add missing E810 device ids
  igc: Update I226_K device ID
  e1000e: Fix packet loss on Tiger Lake and later
  e1000e: Separate TGP board type from SPT
  ptp: Fix possible memory leak in ptp_clock_register()
  net: stmmac: Fix E2E delay mechanism
  nfc: st95hf: Make spi remove() callback return zero
  net: hns3: disable sriov before unload hclge layer
  net: hns3: fix vf reset workqueue cannot exit
  ...

4 years agoMerge tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 22 Oct 2021 01:30:09 +0000 (15:30 -1000)]
Merge tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a bug exposed by a previous fix, where running guests with
   certain SMT topologies could crash the host on Power8.

 - Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is
   enabled.

Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider.

* tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/smp: do not decrement idle task preempt count in CPU offline
  powerpc/idle: Don't corrupt back chain when going idle

4 years agoRevert "drm/ast: Add detect function support"
Kim Phillips [Thu, 21 Oct 2021 15:30:06 +0000 (10:30 -0500)]
Revert "drm/ast: Add detect function support"

This reverts commit aae74ff9caa8de9a45ae2e46068c417817392a26,
since it prevents my AMD Milan system from booting, with:

[   27.189558] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   27.197506] #PF: supervisor write access in kernel mode
[   27.203333] #PF: error_code(0x0002) - not-present page
[   27.209064] PGD 0 P4D 0
[   27.211885] Oops: 0002 [#1] PREEMPT SMP NOPTI
[   27.216744] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6+ #15
[   27.223928] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021
[   27.232955] RIP: 0010:run_timer_softirq+0x38b/0x4a0
[   27.238397] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c
[   27.259350] RSP: 0018:ffffc42d00003ee8 EFLAGS: 00010086
[   27.265176] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000101
[   27.273134] RDX: 0000000000000000 RSI: 0000000000000087 RDI: 0000000000000001
[   27.281084] RBP: ffffc42d00003f70 R08: 0000000000000000 R09: 00000000000003eb
[   27.289043] R10: ffffa0860cb300d0 R11: ffffa0c44de290b0 R12: ffffc42d00003ef8
[   27.297002] R13: 00000000fffef200 R14: ffffa0c44de18dc0 R15: ffffa0867a882350
[   27.304961] FS:  0000000000000000(0000) GS:ffffa0c44de00000(0000) knlGS:0000000000000000
[   27.313988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.320396] CR2: 0000000000000000 CR3: 000000014569c001 CR4: 0000000000770ef0
[   27.328346] PKRU: 55555554
[   27.331359] Call Trace:
[   27.334073]  <IRQ>
[   27.336314]  ? __queue_work+0x420/0x420
[   27.340589]  ? lapic_next_event+0x21/0x30
[   27.345060]  ? clockevents_program_event+0x8f/0xe0
[   27.350402]  __do_softirq+0xfb/0x2db
[   27.354388]  irq_exit_rcu+0x98/0xd0
[   27.358275]  sysvec_apic_timer_interrupt+0xac/0xd0
[   27.363620]  </IRQ>
[   27.365955]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[   27.371685] RIP: 0010:cpuidle_enter_state+0xcc/0x390
[   27.377292] Code: 3d 01 79 0a 50 e8 44 ed 77 ff 49 89 c6 0f 1f 44 00 00 31 ff e8 f5 f8 77 ff 80 7d d7 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 17 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 48 8d
[   27.398243] RSP: 0018:ffffffffb0e03dc8 EFLAGS: 00000246
[   27.404069] RAX: ffffa0c44de00000 RBX: 0000000000000001 RCX: 000000000000001f
[   27.412028] RDX: 0000000000000000 RSI: ffffffffb0bafc1f RDI: ffffffffb0bbdb81
[   27.419986] RBP: ffffffffb0e03e00 R08: 00000006549f8f3f R09: ffffffffb1065200
[   27.427935] R10: ffffa0c44de27ae4 R11: ffffa0c44de27ac4 R12: ffffa0c5634cb000
[   27.435894] R13: ffffffffb1065200 R14: 00000006549f8f3f R15: 0000000000000001
[   27.443854]  ? cpuidle_enter_state+0xbb/0x390
[   27.448712]  cpuidle_enter+0x2e/0x40
[   27.452695]  call_cpuidle+0x23/0x40
[   27.456584]  do_idle+0x1f0/0x270
[   27.460181]  cpu_startup_entry+0x20/0x30
[   27.464553]  rest_init+0xd4/0xe0
[   27.468149]  arch_call_rest_init+0xe/0x1b
[   27.472619]  start_kernel+0x6bc/0x6e2
[   27.476764]  x86_64_start_reservations+0x24/0x26
[   27.481912]  x86_64_start_kernel+0x75/0x79
[   27.486477]  secondary_startup_64_no_verify+0xb0/0xbb
[   27.492111] Modules linked in: kvm_amd(+) kvm ipmi_si(+) ipmi_devintf rapl wmi_bmof ipmi_msghandler input_leds ccp k10temp mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel syscopyarea aesni_intel sysfillrect crypto_simd sysimgblt fb_sys_fops cryptd hid_generic cec nvme ahci usbhid drm e1000e nvme_core hid libahci i2c_piix4 wmi
[   27.551789] CR2: 0000000000000000
[   27.555482] ---[ end trace 897987dfe93dccc6 ]---
[   27.560630] RIP: 0010:run_timer_softirq+0x38b/0x4a0
[   27.566069] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c
[   27.587021] RSP: 0018:ffffc42d00003ee8 EFLAGS: 00010086
[   27.592848] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000101
[   27.600808] RDX: 0000000000000000 RSI: 0000000000000087 RDI: 0000000000000001
[   27.608765] RBP: ffffc42d00003f70 R08: 0000000000000000 R09: 00000000000003eb
[   27.616716] R10: ffffa0860cb300d0 R11: ffffa0c44de290b0 R12: ffffc42d00003ef8
[   27.624673] R13: 00000000fffef200 R14: ffffa0c44de18dc0 R15: ffffa0867a882350
[   27.632624] FS:  0000000000000000(0000) GS:ffffa0c44de00000(0000) knlGS:0000000000000000
[   27.641650] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.648159] CR2: 0000000000000000 CR3: 000000014569c001 CR4: 0000000000770ef0
[   27.656119] PKRU: 55555554
[   27.659133] Kernel panic - not syncing: Fatal exception in interrupt
[   29.030411] Shutting down cpus with NMI
[   29.034699] Kernel Offset: 0x2e600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   29.046790] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Since unreliable, found by bisecting for KASAN's use-after-free in
enqueue_timer+0x4f/0x1e0, where the timer callback is called.

Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Fixes: aae74ff9caa8 ("drm/ast: Add detect function support")
Link: https://lore.kernel.org/lkml/0f7871be-9ca6-5ae4-3a40-5db9a8fb2365@amd.com/
Cc: Ainux <ainux.wang@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: sterlingteng@gmail.com
Cc: chenhuacai@kernel.org
Cc: Chuck Lever III <chuck.lever@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jon Grimm <jon.grimm@amd.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021153006.92983-1-kim.phillips@amd.com
4 years agoMerge tag 'drm-misc-fixes-2021-10-21-1' of git://anongit.freedesktop.org/drm/drm...
Dave Airlie [Thu, 21 Oct 2021 19:34:54 +0000 (05:34 +1000)]
Merge tag 'drm-misc-fixes-2021-10-21-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.15-rc7:
- Rebased, to remove vc4 patches.
- Fix mxsfb crash on unload.
- Use correct sync parameters for Feixin K101-IM2BYL02.
- Assorted kmb modeset/atomic fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e66eaf89-b9b9-41f5-d0d2-dad7e59fabb5@linux.intel.com
4 years agoMerge tag 'drm-msm-fixes-2021-10-18' of https://gitlab.freedesktop.org/drm/msm into...
Dave Airlie [Thu, 21 Oct 2021 19:22:10 +0000 (05:22 +1000)]
Merge tag 'drm-msm-fixes-2021-10-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

One more fix for v5.15, to work around a power stability issue on a630
(and possibly others)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs1WPLthmd=ToDcEHm=u-7O38RAVJ2XwRoS8xPmC520vg@mail.gmail.com
4 years agousbnet: sanity check for maxpacket
Oliver Neukum [Thu, 21 Oct 2021 12:29:44 +0000 (14:29 +0200)]
usbnet: sanity check for maxpacket

maxpacket of 0 makes no sense and oopses as we need to divide
by it. Give up.

V2: fixed typo in log and stylistic issues

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: syzbot+76bb1d34ffa0adc03baa@syzkaller.appspotmail.com
Reviewed-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211021122944.21816-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: enetc: make sure all traffic classes can send large frames
Vladimir Oltean [Wed, 20 Oct 2021 17:33:40 +0000 (20:33 +0300)]
net: enetc: make sure all traffic classes can send large frames

The enetc driver does not implement .ndo_change_mtu, instead it
configures the MAC register field PTC{Traffic Class}MSDUR[MAXSDU]
statically to a large value during probe time.

The driver used to configure only the max SDU for traffic class 0, and
that was fine while the driver could only use traffic class 0. But with
the introduction of mqprio, sending a large frame into any other TC than
0 is broken.

This patch fixes that by replicating per traffic class the static
configuration done in enetc_configure_port_mac().

Fixes: cbe9e835946f ("enetc: Enable TC offloading with mqprio")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020173340.1089992-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agonet: enetc: fix ethtool counter name for PM0_TERR
Vladimir Oltean [Wed, 20 Oct 2021 16:52:06 +0000 (19:52 +0300)]
net: enetc: fix ethtool counter name for PM0_TERR

There are two counters named "MAC tx frames", one of them is actually
incorrect. The correct name for that counter should be "MAC tx error
frames", which is symmetric to the existing "MAC rx error frames".

Fixes: 16eb4c85c964 ("enetc: Add ethtool statistics")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20211020165206.1069889-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoptp: free 'vclock_index' in ptp_clock_release()
Yang Yingliang [Thu, 21 Oct 2021 09:13:53 +0000 (17:13 +0800)]
ptp: free 'vclock_index' in ptp_clock_release()

'vclock_index' is accessed from sysfs, it shouled be freed
in release function, so move it from ptp_clock_unregister()
to ptp_clock_release().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: Don't use netif_info before net_device setup
Erik Ekman [Tue, 19 Oct 2021 22:40:16 +0000 (00:40 +0200)]
sfc: Don't use netif_info before net_device setup

Use pci_info instead to avoid unnamed/uninitialized noise:

[197088.688729] sfc 0000:01:00.0: Solarflare NIC detected
[197088.690333] sfc 0000:01:00.0: Part Number : SFN5122F
[197088.729061] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no SR-IOV VFs probed
[197088.729071] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no PTP support

Inspired by fa44821a4ddd ("sfc: don't use netif_info et al before
net_device is registered") from Heiner Kallweit.

Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: Export fibre-specific supported link modes
Erik Ekman [Tue, 19 Oct 2021 21:13:32 +0000 (23:13 +0200)]
sfc: Export fibre-specific supported link modes

The 1/10GbaseT modes were set up for cards with SFP+ cages in
3497ed8c852a5 ("sfc: report supported link speeds on SFP connections").
10GbaseT was likely used since no 10G fibre mode existed.

The missing fibre modes for 1/10G were added to ethtool.h in 5711a9822144
("net: ethtool: add support for 1000BaseX and missing 10G link modes")
shortly thereafter.

The user guide available at https://support-nic.xilinx.com/wp/drivers
lists support for the following cable and transceiver types in section 2.9:
- QSFP28 100G Direct Attach Cables
- QSFP28 100G SR Optical Transceivers (with SR4 modules listed)
- SFP28 25G Direct Attach Cables
- SFP28 25G SR Optical Transceivers
- QSFP+ 40G Direct Attach Cables
- QSFP+ 40G Active Optical Cables
- QSFP+ 40G SR4 Optical Transceivers
- QSFP+ to SFP+ Breakout Direct Attach Cables
- QSFP+ to SFP+ Breakout Active Optical Cables
- SFP+ 10G Direct Attach Cables
- SFP+ 10G SR Optical Transceivers
- SFP+ 10G LR Optical Transceivers
- SFP 1000BASE‐T Transceivers
- 1G Optical Transceivers
(From user guide issue 28. Issue 16 which also includes older cards like
SFN5xxx/SFN6xxx has matching lists for 1/10/40G transceiver types.)

Regarding SFP+ 10GBASE‐T transceivers the latest guide says:
"Solarflare adapters do not support 10GBASE‐T transceiver modules."

Tested using SFN5122F-R7 (with 2 SFP+ ports). Supported link modes do not change
depending on module used (tested with 1000BASE-T, 1000BASE-BX10, 10GBASE-LR).
Before:

$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes:   1000baseT/Full
                        10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes:  Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
        Current message level: 0x000020f7 (8439)
                               drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes

After:

$ ethtool ext
Settings for ext:
Supported ports: [ FIBRE ]
Supported link modes:   1000baseT/Full
                        1000baseX/Full
                        10000baseCR/Full
                        10000baseSR/Full
                        10000baseLR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Link partner advertised link modes:  Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: No
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 255
Transceiver: internal
Supports Wake-on: g
Wake-on: d
        Current message level: 0x000020f7 (8439)
                               drv probe link ifdown ifup rx_err tx_err hw
Link detected: yes

Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Thu, 21 Oct 2021 11:32:41 +0000 (12:32 +0100)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter fixes for net:

1) Crash due to missing initialization of timer data in
   xt_IDLETIMER, from Juhee Kang.

2) NF_CONNTRACK_SECMARK should be bool in Kconfig, from Vegard Nossum.

3) Skip netdev events on netns removal, from Florian Westphal.

4) Add testcase to show port shadowing via UDP, also from Florian.

5) Remove pr_debug() code in ip6t_rt, this fixes a crash due to
   unsafe access to non-linear skbuff, from Xin Long.

6) Make net/ipv4/vs/debug_level read-only from non-init netns,
   from Antoine Tenart.

7) Remove bogus invocation to bash in selftests/netfilter/nft_flowtable.sh
   also from Florian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mlx5-fixes-2021-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Thu, 21 Oct 2021 11:11:26 +0000 (12:11 +0100)]
Merge tag 'mlx5-fixes-2021-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2021-10-20
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
David S. Miller [Thu, 21 Oct 2021 11:10:29 +0000 (12:10 +0100)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-10-20

This series contains updates to e1000e, igc, and ice drivers.

Sasha fixes an issue with dropped packets on Tiger Lake platforms for
e1000e and corrects a device ID for igc.

Tony adds missing E810 device IDs for ice.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodrm/kmb: Enable ADV bridge after modeset
Anitha Chrisanthus [Mon, 7 Jun 2021 21:17:11 +0000 (14:17 -0700)]
drm/kmb: Enable ADV bridge after modeset

On KMB, ADV bridge must be programmed and powered on prior to
MIPI DSI HW initialization.

v2: changed to atomic_bridge_chain_enable (Sam)

Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Co-developed-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211019230719.789958-1-anitha.chrisanthus@intel.com
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agodrm/kmb: Corrected typo in handle_lcd_irq
Anitha Chrisanthus [Mon, 19 Jul 2021 23:28:51 +0000 (16:28 -0700)]
drm/kmb: Corrected typo in handle_lcd_irq

Check for Overflow bits for layer3 in the irq handler.

Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-5-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agodrm/kmb: Disable change of plane parameters
Edmund Dea [Wed, 6 Oct 2021 23:03:48 +0000 (16:03 -0700)]
drm/kmb: Disable change of plane parameters

Due to HW limitations, KMB cannot change height, width, or
pixel format after initial plane configuration.

v2: removed memset disp_cfg as it is already zero.

Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-4-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agodrm/kmb: Remove clearing DPHY regs
Edmund Dea [Tue, 20 Apr 2021 22:31:53 +0000 (15:31 -0700)]
drm/kmb: Remove clearing DPHY regs

Don't clear the shared DPHY registers common to MIPI Rx and MIPI Tx during
DSI initialization since this was causing MIPI Rx reset. Rest of the
writes are bitwise, so will not affect Mipi Rx side.

Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-3-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agodrm/kmb: Limit supported mode to 1080p
Anitha Chrisanthus [Fri, 8 Jan 2021 22:34:13 +0000 (14:34 -0800)]
drm/kmb: Limit supported mode to 1080p

KMB only supports single resolution(1080p), this commit checks for
1920x1080x60 or 1920x1080x59 in crtc_mode_valid.
Also, modes with vfp < 4 are not supported in KMB display. This change
prunes display modes with vfp < 4.

v2: added vfp check

Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display")
Co-developed-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Edmund Dea <edmund.j.dea@intel.com>
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link:https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-2-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agodrm/kmb: Work around for higher system clock
Anitha Chrisanthus [Tue, 15 Dec 2020 19:13:09 +0000 (11:13 -0800)]
drm/kmb: Work around for higher system clock

Use a different value for system clock offset in the
ppl/llp ratio calculations for clocks higher than 500 Mhz.

Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver")
Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-1-anitha.chrisanthus@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agodrm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel
Dan Johansen [Wed, 18 Aug 2021 21:48:18 +0000 (23:48 +0200)]
drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel

This adjusts sync values according to the datasheet

Fixes: 1c243751c095 ("drm/panel: ilitek-ili9881c: add support for Feixin K101-IM2BYL02 panel")
Co-developed-by: Marius Gripsgard <marius@ubports.com>
Signed-off-by: Dan Johansen <strit@manjaro.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210818214818.298089-1-strit@manjaro.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agodrm: mxsfb: Fix NULL pointer dereference crash on unload
Marek Vasut [Sat, 16 Oct 2021 21:04:46 +0000 (23:04 +0200)]
drm: mxsfb: Fix NULL pointer dereference crash on unload

The mxsfb->crtc.funcs may already be NULL when unloading the driver,
in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from
mxsfb_unload() leads to NULL pointer dereference.

Since all we care about is masking the IRQ and mxsfb->base is still
valid, just use that to clear and mask the IRQ.

Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline helper")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Daniel Abrecht <public@danielabrecht.ch>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stefan Agner <stefan@agner.ch>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211016210446.171616-1-marex@denx.de
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
4 years agoMerge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client
Linus Torvalds [Wed, 20 Oct 2021 20:23:05 +0000 (10:23 -1000)]
Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two important filesystem fixes, marked for stable.

  The blocklisted superblocks issue was particularly annoying because
  for unexperienced users it essentially exacted a reboot to establish a
  new functional mount in that scenario"

* tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client:
  ceph: fix handling of "meta" errors
  ceph: skip existing superblocks that are blocklisted or shut down when mounting

4 years agoMerge tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Wed, 20 Oct 2021 20:16:51 +0000 (10:16 -1000)]
Merge tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - fix more dma-debug fallout (Gerald Schaefer, Hamza Mahfooz)

 - fix a kerneldoc warning (Logan Gunthorpe)

* tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping:
  dma-debug: teach add_dma_entry() about DMA_ATTR_SKIP_CPU_SYNC
  dma-debug: fix sg checks in debug_dma_map_sg()
  dma-mapping: fix the kerneldoc for dma_map_sgtable()

4 years agonet/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags
Emeel Hakim [Mon, 18 Oct 2021 12:31:19 +0000 (15:31 +0300)]
net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags

Current Work Queue Entry (WQE) checksum (csum) flags in the ethernet
segment (eseg) in case of IPsec crypto offload datapath are not aligned
with PRM/HW expectations.

Currently the driver always sets the l3_inner_csum flag in case of IPsec
because of the wrong usage of skb->encapsulation as indicator for inner
IPsec header since skb->encapsulation is always ON for IPsec packets
since IPsec itself is an encapsulation protocol. The above forced a
failing attempts of calculating csum of non-existing segments (like in
the IP|ESP|TCP packet case which does not have an l3_inner) which led
to lots of packet drops hence the low throughput.

Fix by using xo->inner_ipproto as indicator for inner IPsec header
instead of skb->encapsulation in addition to setting the csum flags
as following:
* Tunnel Mode:
* Pkt: MAC  IP     ESP  IP    L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs
*
* Transport Mode:
* Pkt: MAC  IP     ESP  L4
* CSUM: l3_cs [ | l4_cs (checksum partial case)]
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* Pkt: MAC  IP     ESP  UDP  VXLAN  IP    L4
* CSUM: l3_cs | l3_inner_cs | l4_inner_cs

Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: IPsec: Fix a misuse of the software parser's fields
Emeel Hakim [Mon, 18 Oct 2021 12:30:09 +0000 (15:30 +0300)]
net/mlx5e: IPsec: Fix a misuse of the software parser's fields

IPsec crypto offload current Software Parser (SWP) fields settings in
the ethernet segment (eseg) are not aligned with PRM/HW expectations.
Among others in case of IP|ESP|TCP packet, current driver sets the
offsets for inner_l3 and inner_l4 although there is no inner l3/l4
headers relative to ESP header in such packets.

SWP provides the offsets for HW ,so it can be used to find csum fields
to offload the checksum, however these are not necessarily used by HW
and are used as fallback in case HW fails to parse the packet, e.g
when performing IPSec Transport Aware (IP | ESP | TCP) there is no
need to add SW parse on inner packet. So in some cases packets csum
was calculated correctly , whereas in other cases it failed. The later
faced csum errors (caused by wrong packet length calculations) which
led to lots of packet drops hence the low throughput.

Fix by setting the SWP fields as expected in a IP|ESP|TCP packet.

the following describe the expected SWP offsets:
* Tunnel Mode:
* SWP:      OutL3       InL3  InL4
* Pkt: MAC  IP     ESP  IP    L4
*
* Transport Mode:
* SWP:      OutL3       OutL4
* Pkt: MAC  IP     ESP  L4
*
* Tunnel(VXLAN TCP/UDP) over Transport Mode
* SWP:      OutL3                   InL3  InL4
* Pkt: MAC  IP     ESP  UDP  VXLAN  IP    L4

Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5e: Fix vlan data lost during suspend flow
Moshe Shemesh [Sat, 2 Oct 2021 08:15:35 +0000 (11:15 +0300)]
net/mlx5e: Fix vlan data lost during suspend flow

During suspend flow the driver calls mlx5e_destroy_vlan_table() which
does not only delete the vlans steering flow rules, but also frees the
data on currently active vlans, thus it is not restored during resume
flow.

This fix keeps the vlan data on suspend flow and frees it only on driver
remove flow.

Fixes: 6783f0a21a3c ("net/mlx5e: Dynamic alloc vlan table for netdev when needed")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: E-switch, Return correct error code on group creation failure
Dmytro Linkin [Wed, 25 Aug 2021 14:51:26 +0000 (17:51 +0300)]
net/mlx5: E-switch, Return correct error code on group creation failure

Dan Carpenter report:
The patch f47e04eb96e0: "net/mlx5: E-switch, Allow setting share/max
tx rate limits of rate groups" from May 31, 2021, leads to the
following Smatch static checker warning:

drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c:483 esw_qos_create_rate_group()
warn: passing zero to 'ERR_PTR'

If min rate normalization failed then error code may be overwritten to 0
if scheduling element destruction succeed. Ignore this value and always
return initial one.

Fixes: f47e04eb96e0 ("net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agonet/mlx5: Lag, change multipath and bonding to be mutually exclusive
Maor Dickman [Thu, 7 Oct 2021 13:05:38 +0000 (16:05 +0300)]
net/mlx5: Lag, change multipath and bonding to be mutually exclusive

Both multipath and bonding events are changing the HW LAG state
independently.
Handling one of the features events while the other is already
enabled can cause unwanted behavior, for example handling
bonding event while multipath enabled will disable the lag and
cause multipath to stop working.

Fix it by ignoring bonding event while in multipath and ignoring FIB
events while in bonding mode.

Fixes: 544fe7c2e654 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
4 years agoMerge tag 'sound-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Wed, 20 Oct 2021 16:13:22 +0000 (06:13 -1000)]
Merge tag 'sound-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Again it became bigger than wished, unfortunately, as this contains
  quite a few ASoC fixes that came up a bit late. It also includes yet
  more HD- and USB-audio quirks: I decided to merge them now, as those
  are for stable, and we'll need them sooner or later.

  Although the volumes are a bit high, all changes are device-specific
  (and reasonably small) fixes, so it should be safe for the late rc"

* tag 'sound-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Fix microphone sound on Jieli webcam.
  ALSA: hda/realtek: Fixes HP Spectre x360 15-eb1xxx speakers
  ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset
  ALSA: hda/realtek: Add quirk for Clevo PC50HS
  ALSA: usb-audio: add Schiit Hel device to quirk table
  ASoC: wm8960: Fix clock configuration on slave mode
  ASoC: cs42l42: Ensure 0dB full scale volume is used for headsets
  ASoC: soc-core: fix null-ptr-deref in snd_soc_del_component_unlocked()
  ASoC: codec: wcd938x: Add irq config support
  ASoC: DAPM: Fix missing kctl change notifications
  ASoC: Intel: bytcht_es8316: Utilize dev_err_probe() to avoid log saturation
  ASoC: Intel: bytcht_es8316: Switch to use gpiod_get_optional()
  ASoC: Intel: bytcht_es8316: Use temporary variable for struct device
  ASoC: Intel: bytcht_es8316: Get platform data via dev_get_platdata()
  ASoC: wcd938x: Fix jack detection issue
  ASoC: nau8824: Fix headphone vs headset, button-press detection no longer working
  ASoC: cs4341: Add SPI device ID table
  ASoC: pcm179x: Add missing entries SPI to device ID table
  ASoC: fsl_xcvr: Fix channel swap issue with ARC
  ASoC: pcm512x: Mend accesses to the I2S_1 and I2S_2 registers

4 years agoMerge tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoor...
Linus Torvalds [Wed, 20 Oct 2021 16:11:17 +0000 (06:11 -1000)]
Merge tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit fix from Paul Moore:
 "One small audit patch to add a pointer NULL check"

* tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix possible null-pointer dereference in audit_filter_rules

4 years agoice: Add missing E810 device ids
Tony Nguyen [Tue, 19 Oct 2021 20:04:16 +0000 (13:04 -0700)]
ice: Add missing E810 device ids

As part of support for E810 XXV devices, some device ids were
inadvertently left out. Add those missing ids.

Fixes: 195fb97766da ("ice: add additional E810 device id")
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
4 years agoigc: Update I226_K device ID
Sasha Neftin [Thu, 9 Sep 2021 17:49:04 +0000 (20:49 +0300)]
igc: Update I226_K device ID

The device ID for I226_K was incorrectly assigned, update the device
ID to the correct one.

Fixes: bfa5e98c9de4 ("igc: Add new device ID")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoe1000e: Fix packet loss on Tiger Lake and later
Sasha Neftin [Wed, 22 Sep 2021 06:55:42 +0000 (09:55 +0300)]
e1000e: Fix packet loss on Tiger Lake and later

Update the HW MAC initialization flow. Do not gate DMA clock from
the modPHY block. Keeping this clock will prevent dropped packets
sent in burst mode on the Kumeran interface.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213651
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213377
Fixes: fb776f5d57ee ("e1000e: Add support for Tiger Lake")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoMerge tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 20 Oct 2021 16:02:58 +0000 (06:02 -1000)]
Merge tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Recursion fix for tracing.

  While cleaning up some of the tracing recursion protection logic, I
  discovered a scenario that the current design would miss, and would
  allow an infinite recursion. Removing an optimization trick that
  opened the hole fixes the issue and cleans up the code as well"

* tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have all levels of checks prevent recursion

4 years agoMerge tag 'nios2_fixes_for_v5.15_part2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 20 Oct 2021 15:56:51 +0000 (05:56 -1000)]
Merge tag 'nios2_fixes_for_v5.15_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux

Pull nios2 fix from Dinh Nguyen:

 - Renamed CTL_STATUS to CTL_FSTATUS to fix a redefined warning

* tag 'nios2_fixes_for_v5.15_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  NIOS2: irqflags: rename a redefined register name

4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Wed, 20 Oct 2021 15:52:10 +0000 (05:52 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Tools:
   - kvm_stat: do not show halt_wait_ns since it is not a cumulative statistic

  x86:
   - clean ups and fixes for bus lock vmexit and lazy allocation of rmaps
   - two fixes for SEV-ES (one more coming as soon as I get reviews)
   - fix for static_key underflow

  ARM:
   - Properly refcount pages used as a concatenated stage-2 PGD
   - Fix missing unlock when detecting the use of MTE+VM_SHARED"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SEV-ES: reduce ghcb_sa_len to 32 bits
  KVM: VMX: Remove redundant handling of bus lock vmexit
  KVM: kvm_stat: do not show halt_wait_ns
  KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload
  Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET"
  KVM: SEV-ES: Set guest_state_protected after VMSA update
  KVM: X86: fix lazy allocation of rmaps
  KVM: SEV-ES: fix length of string I/O
  KVM: arm64: Release mmap_lock when using VM_SHARED with MTE
  KVM: arm64: Report corrupted refcount at EL2
  KVM: arm64: Fix host stage-2 PGD refcount
  KVM: s390: Function documentation fixes

4 years agoe1000e: Separate TGP board type from SPT
Sasha Neftin [Wed, 22 Sep 2021 06:54:49 +0000 (09:54 +0300)]
e1000e: Separate TGP board type from SPT

We have the same LAN controller on different PCHs. Separate TGP board
type from SPT which will allow for specific fixes to be applied for
TGP platforms.

Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
4 years agoucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring
Eric W. Biederman [Sat, 16 Oct 2021 17:17:30 +0000 (12:17 -0500)]
ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring

Setting cred->ucounts in cred_alloc_blank does not make sense.  The
uid and user_ns are deliberately not set in cred_alloc_blank but
instead the setting is delayed until key_change_session_keyring.

So move dealing with ucounts into key_change_session_keyring as well.

Unfortunately that movement of get_ucounts adds a new failure mode to
key_change_session_keyring.  I do not see anything stopping the parent
process from calling setuid and changing the relevant part of it's
cred while keyctl_session_to_parent is running making it fundamentally
necessary to call get_ucounts in key_change_session_keyring.  Which
means that the new failure mode cannot be avoided.

A failure of key_change_session_keyring results in a single threaded
parent keeping it's existing credentials.  Which results in the parent
process not being able to access the session keyring and whichever
keys are in the new keyring.

Further get_ucounts is only expected to fail if the number of bits in
the refernece count for the structure is too few.

Since the code has no other way to report the failure of get_ucounts
and because such failures are not expected to be common add a WARN_ONCE
to report this problem to userspace.

Between the WARN_ONCE and the parent process not having access to
the keys in the new session keyring I expect any failure of get_ucounts
will be noticed and reported and we can find another way to handle this
condition.  (Possibly by just making ucounts->count an atomic_long_t).

Cc: stable@vger.kernel.org
Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred")
Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
4 years agoptp: Fix possible memory leak in ptp_clock_register()
Yang Yingliang [Wed, 20 Oct 2021 08:18:34 +0000 (16:18 +0800)]
ptp: Fix possible memory leak in ptp_clock_register()

I got memory leak as follows when doing fault injection test:

unreferenced object 0xffff88800906c618 (size 8):
  comm "i2c-idt82p33931", pid 4421, jiffies 4294948083 (age 13.188s)
  hex dump (first 8 bytes):
    70 74 70 30 00 00 00 00                          ptp0....
  backtrace:
    [<00000000312ed458>] __kmalloc_track_caller+0x19f/0x3a0
    [<0000000079f6e2ff>] kvasprintf+0xb5/0x150
    [<0000000026aae54f>] kvasprintf_const+0x60/0x190
    [<00000000f323a5f7>] kobject_set_name_vargs+0x56/0x150
    [<000000004e35abdd>] dev_set_name+0xc0/0x100
    [<00000000f20cfe25>] ptp_clock_register+0x9f4/0xd30 [ptp]
    [<000000008bb9f0de>] idt82p33_probe.cold+0x8b6/0x1561 [ptp_idt82p33]

When posix_clock_register() returns an error, the name allocated
in dev_set_name() will be leaked, the put_device() should be used
to give up the device reference, then the name will be freed in
kobject_cleanup() and other memory will be freed in ptp_clock_release().

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: Fix E2E delay mechanism
Kurt Kanzenbach [Wed, 20 Oct 2021 07:04:33 +0000 (09:04 +0200)]
net: stmmac: Fix E2E delay mechanism

When utilizing End to End delay mechanism, the following error messages show up:

|root@ehl1:~# ptp4l --tx_timestamp_timeout=50 -H -i eno2 -E -m
|ptp4l[950.573]: selected /dev/ptp3 as PTP clock
|ptp4l[950.586]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[950.586]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[952.879]: port 1: new foreign master 001395.fffe.4897b4-1
|ptp4l[956.879]: selected best master clock 001395.fffe.4897b4
|ptp4l[956.879]: port 1: assuming the grand master role
|ptp4l[956.879]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER
|ptp4l[962.017]: port 1: received DELAY_REQ without timestamp
|ptp4l[962.273]: port 1: received DELAY_REQ without timestamp
|ptp4l[963.090]: port 1: received DELAY_REQ without timestamp

Commit f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP
packets in dwmac v5.10a") already addresses this problem for the dwmac
v5.10. However, same holds true for all dwmacs above version v4.10. Correct the
check accordingly. Afterwards everything works as expected.

Tested on Intel Atom(R) x6414RE Processor.

Fixes: 14f347334bf2 ("net: stmmac: Correctly take timestamp for PTPv2")
Fixes: f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a")
Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfc: st95hf: Make spi remove() callback return zero
Uwe Kleine-König [Tue, 19 Oct 2021 20:49:16 +0000 (22:49 +0200)]
nfc: st95hf: Make spi remove() callback return zero

If something goes wrong in the remove callback, returning an error code
just results in an error message. The device still disappears.

So don't skip disabling the regulator in st95hf_remove() if resetting
the controller via spi fails. Also don't return an error code which just
results in two error messages.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'hns3-fixes'
David S. Miller [Wed, 20 Oct 2021 10:38:11 +0000 (11:38 +0100)]
Merge branch 'hns3-fixes'

Guangbin Huang says:

====================
net: hns3: add some fixes for -net

This series adds some fixes for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: disable sriov before unload hclge layer
Peng Li [Tue, 19 Oct 2021 14:16:35 +0000 (22:16 +0800)]
net: hns3: disable sriov before unload hclge layer

HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko.
hns3.ko includes network stack and pci_driver, hclge.ko includes
HW device action, algo_ops and timer task, hnae3.ko includes some
register function.

When SRIOV is enable and hclge.ko is removed, HW device is unloaded
but VF still exists, PF will not reply VF mbx messages, and cause
errors.

This patch fix it by disable SRIOV before remove hclge.ko.

Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: fix vf reset workqueue cannot exit
Yufeng Mo [Tue, 19 Oct 2021 14:16:34 +0000 (22:16 +0800)]
net: hns3: fix vf reset workqueue cannot exit

The task of VF reset is performed through the workqueue. It checks the
value of hdev->reset_pending to determine whether to exit the loop.
However, the value of hdev->reset_pending may also be assigned by
the interrupt function hclgevf_misc_irq_handle(), which may cause the
loop fail to exit and keep occupying the workqueue. This loop is not
necessary, so remove it and the workqueue will be rescheduled if the
reset needs to be retried or a new reset occurs.

Fixes: 1cc9bc6e5867 ("net: hns3: split hclgevf_reset() into preparing and rebuilding part")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: schedule the polling again when allocation fails
Yunsheng Lin [Tue, 19 Oct 2021 14:16:33 +0000 (22:16 +0800)]
net: hns3: schedule the polling again when allocation fails

Currently when there is a rx page allocation failure, it is
possible that polling may be stopped if there is no more packet
to be reveiced, which may cause queue stall problem under memory
pressure.

This patch makes sure polling is scheduled again when there is
any rx page allocation failure, and polling will try to allocate
receive buffers until it succeeds.

Now the allocation retry is added, it is unnecessary to do the rx
page allocation at the end of rx cleaning, so remove it. And reset
the unused_count to zero after calling hns3_nic_alloc_rx_buffers()
to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under
memory pressure.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: fix for miscalculation of rx unused desc
Yunsheng Lin [Tue, 19 Oct 2021 14:16:32 +0000 (22:16 +0800)]
net: hns3: fix for miscalculation of rx unused desc

rx unused desc is the desc that need attatching new buffer
before refilling to hw to receive new packet, the number of
desc need attatching new buffer is calculated using next_to_use
and next_to_clean. when next_to_use == next_to_clean, currently
hns3 driver assumes that all the desc has the buffer attatched,
but 'next_to_use == next_to_clean' also means all the desc need
attatching new buffer if hw has comsumed all the desc and the
driver has not attatched any buffer to the desc yet.

This patch adds 'refill' in desc_cb to indicate whether a new
buffer has been refilled to a desc.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: fix the max tx size according to user manual
Yunsheng Lin [Tue, 19 Oct 2021 14:16:31 +0000 (22:16 +0800)]
net: hns3: fix the max tx size according to user manual

Currently the max tx size supported by the hw is calculated by
using the max BD num supported by the hw. According to the hw
user manual, the max tx size is fixed value for both non-TSO and
TSO skb.

This patch updates the max tx size according to the manual.

Fixes: 8ae10cfb5089("net: hns3: support tx-scatter-gather-fraglist feature")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: add limit ets dwrr bandwidth cannot be 0
Guangbin Huang [Tue, 19 Oct 2021 14:16:30 +0000 (22:16 +0800)]
net: hns3: add limit ets dwrr bandwidth cannot be 0

If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP
mode. In this case, this tc may occupy all the tx bandwidth if it has
huge traffic, so it violates the purpose of the user setting.

To fix this problem, limit the ets dwrr bandwidth must greater than 0.

Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: reset DWRR of unused tc to zero
Guangbin Huang [Tue, 19 Oct 2021 14:16:29 +0000 (22:16 +0800)]
net: hns3: reset DWRR of unused tc to zero

Currently, DWRR of tc will be initialized to a fixed value when this tc
is enabled, but it is not been reset to 0 when this tc is disabled. It
cause a problem that the DWRR of unused tc is not 0 after using tc tool
to add and delete multi-tc parameters.

For examples, after enabling 4 TCs and restoring to 1 TC by follow
tc commands:

$ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \
  8@0 8@8 8@16 8@24 hw 1 mode channel
$ tc qdisc del dev eth0 root

Now there is just one TC is enabled for eth0, but the tc info querying by
debugfs is shown as follow:

$ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info
enabled tc number: 1
weight_offset: 14
TC    MODE  WEIGHT
0     dwrr    100
1     dwrr    100
2     dwrr    100
3     dwrr    100
4     dwrr      0
5     dwrr      0
6     dwrr      0
7     dwrr      0

This patch fixes it by resetting DWRR of tc to 0 when tc is disabled.

Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: Add configuration of TM QCN error event
Jiaran Zhang [Tue, 19 Oct 2021 14:16:28 +0000 (22:16 +0800)]
net: hns3: Add configuration of TM QCN error event

Add configuration of interrupt type and fifo interrupt enable of TM QCN
error event if enabled, otherwise this event will not be reported when
there is error.

Fixes: d914971df022 ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agopowerpc/smp: do not decrement idle task preempt count in CPU offline
Nathan Lynch [Fri, 15 Oct 2021 17:39:02 +0000 (12:39 -0500)]
powerpc/smp: do not decrement idle task preempt count in CPU offline

With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:

BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
Call Trace:
 dump_stack_lvl+0xac/0x108
 __schedule_bug+0xac/0xe0
 __schedule+0xcf8/0x10d0
 schedule_idle+0x3c/0x70
 do_idle+0x2d8/0x4a0
 cpu_startup_entry+0x38/0x40
 start_secondary+0x2ec/0x3a0
 start_secondary_prolog+0x10/0x14

This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8279d2 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."

However, since commit 2c669ef6979c ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a376ca0c ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.

The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.

Tested with pseries and powernv in qemu, and pseries on PowerVM.

Fixes: 2c669ef6979c ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com
4 years agopowerpc/idle: Don't corrupt back chain when going idle
Michael Ellerman [Wed, 20 Oct 2021 09:48:26 +0000 (20:48 +1100)]
powerpc/idle: Don't corrupt back chain when going idle

In isa206_idle_insn_mayloss() we store various registers into the stack
red zone, which is allowed.

However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again,
to 0(r1), which corrupts the stack back chain.

We used to do the same in isa206_idle_insn_mayloss() itself, but we
fixed that in 73287caa9210 ("powerpc64/idle: Fix SP offsets when saving
GPRs"), however we missed that the macro also corrupts the back chain.

Corrupting the back chain is bad for debuggability but doesn't
necessarily cause a bug.

However we recently changed the stack handling in some KVM code, and it
now relies on the stack back chain being valid when it returns. The
corruption causes that code to return with r1 pointing somewhere in
kernel data, at some point LR is restored from the stack and we branch
to NULL or somewhere else invalid.

Only affects Power8 hosts running KVM guests, with dynamic_mt_modes
enabled (which it is by default).

The fixes tag below points to the commit that changed the KVM stack
handling, exposing this bug. The actual corruption of the back chain has
always existed since 948cf67c4726 ("powerpc: Add NAP mode support on
Power7 in HV mode").

Fixes: 9b4416c5095c ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020094826.3222052-1-mpe@ellerman.id.au
4 years agovrf: Revert "Reset skb conntrack connection..."
Eugene Crosser [Mon, 18 Oct 2021 18:22:50 +0000 (20:22 +0200)]
vrf: Revert "Reset skb conntrack connection..."

This reverts commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1.

When an interface is enslaved in a VRF, prerouting conntrack hook is
called twice: once in the context of the original input interface, and
once in the context of the VRF interface. If no special precausions are
taken, this leads to creation of two conntrack entries instead of one,
and breaks SNAT.

Commit above was intended to avoid creation of extra conntrack entries
when input interface is enslaved in a VRF. It did so by resetting
conntrack related data associated with the skb when it enters VRF context.

However it breaks netfilter operation. Imagine a use case when conntrack
zone must be assigned based on the original input interface, rather than
VRF interface (that would make original interfaces indistinguishable). One
could create netfilter rules similar to these:

        chain rawprerouting {
                type filter hook prerouting priority raw;
                iif realiface1 ct zone set 1 return
                iif realiface2 ct zone set 2 return
        }

This works before the mentioned commit, but not after: zone assignment
is "forgotten", and any subsequent NAT or filtering that is dependent
on the conntrack zone does not work.

Here is a reproducer script that demonstrates the difference in behaviour.

==========
#!/bin/sh

# This script demonstrates unexpected change of nftables behaviour
# caused by commit 09e856d54bda5f28 ""vrf: Reset skb conntrack
# connection on VRF rcv"
# https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
#
# Before the commit, it was possible to assign conntrack zone to a
# packet (or mark it for `notracking`) in the prerouting chanin, raw
# priority, based on the `iif` (interface from which the packet
# arrived).
# After the change, # if the interface is enslaved in a VRF, such
# assignment is lost. Instead, assignment based on the `iif` matching
# the VRF master interface is honored. Thus it is impossible to
# distinguish packets based on the original interface.
#
# This script demonstrates this change of behaviour: conntrack zone 1
# or 2 is assigned depending on the match with the original interface
# or the vrf master interface. It can be observed that conntrack entry
# appears in different zone in the kernel versions before and after
# the commit.

IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30

ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct

ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
/sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout

nft -f - <<__END__
table testct {
chain rawpre {
type filter hook prerouting priority raw;
iif { veout, tvrf } meta nftrace set 1
iif veout ct zone set 1 return
iif tvrf ct zone set 2 return
notrack
}
chain rawout {
type filter hook output priority raw;
notrack
}
}
__END__

uname -rv
conntrack -F
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L

Signed-off-by: Eugene Crosser <crosser@average.org>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()'
Christophe JAILLET [Mon, 18 Oct 2021 19:59:00 +0000 (21:59 +0200)]
net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()'

If we return before the end of the 'for_each_child_of_node()' iterator, the
reference taken on 'port' must be released.

Add the missing 'of_node_put()' calls.

Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 years agoucounts: Proper error handling in set_cred_ucounts
Eric W. Biederman [Sat, 16 Oct 2021 17:47:51 +0000 (12:47 -0500)]
ucounts: Proper error handling in set_cred_ucounts

Instead of leaking the ucounts in new if alloc_ucounts fails, store
the result of alloc_ucounts into a temporary variable, which is later
assigned to new->ucounts.

Cc: stable@vger.kernel.org
Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred")
Link: https://lkml.kernel.org/r/87pms2s0v8.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
4 years agoucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds
Eric W. Biederman [Sat, 16 Oct 2021 17:30:00 +0000 (12:30 -0500)]
ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds

The purpose of inc_rlimit_ucounts and dec_rlimit_ucounts in commit_creds
is to change which rlimit counter is used to track a process when the
credentials changes.

Use the same test for both to guarantee the tracking is correct.

Cc: stable@vger.kernel.org
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Link: https://lkml.kernel.org/r/87v91us0w4.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Tue, 19 Oct 2021 15:41:36 +0000 (05:41 -1000)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "19 patches.

  Subsystems affected by this patch series: mm (userfaultfd, migration,
  memblock, mempolicy, slub, secretmem, and thp), ocfs2, binfmt, vfs,
  and misc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mailmap: add Andrej Shadura
  mm/thp: decrease nr_thps in file's mapping on THP split
  mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()
  vfs: check fd has read access in kernel_read_file_from_fd()
  elfcore: correct reference to CONFIG_UML
  mm, slub: fix incorrect memcg slab count for bulk free
  mm, slub: fix potential use-after-free in slab_debugfs_fops
  mm, slub: fix potential memoryleak in kmem_cache_open()
  mm, slub: fix mismatch between reconstructed freelist depth and cnt
  mm, slub: fix two bugs in slab_debug_trace_open()
  mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
  memblock: check memory total_size
  ocfs2: mount fails with buffer overflow in strlen
  ocfs2: fix data corruption after conversion from inline format
  mm/migrate: fix CPUHP state to update node demotion order
  mm/migrate: add CPU hotplug to demotion #ifdef
  mm/migrate: optimize hotplug-time demotion order updates
  userfaultfd: fix a race between writeprotect and exit_mmap()
  mm/userfaultfd: selftests: fix memory corruption with thp enabled

4 years agoMerge tag 'linux-can-fixes-for-5.15-20211019' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Tue, 19 Oct 2021 12:14:36 +0000 (13:14 +0100)]
Merge tag 'linux-can-fixes-for-5.15-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-10-19

this is a pull request of a single patch for net/master.

The patch is by me and fixes the error handling in case of a FC
timeout in the TX path of the ISOTOP CAN protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocavium: Fix return values of the probe function
Zheyu Ma [Mon, 18 Oct 2021 14:32:57 +0000 (14:32 +0000)]
cavium: Fix return values of the probe function

During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomISDN: Fix return values of the probe function
Zheyu Ma [Mon, 18 Oct 2021 14:20:38 +0000 (14:20 +0000)]
mISDN: Fix return values of the probe function

During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoceph: fix handling of "meta" errors
Jeff Layton [Thu, 7 Oct 2021 18:19:49 +0000 (14:19 -0400)]
ceph: fix handling of "meta" errors

Currently, we check the wb_err too early for directories, before all of
the unsafe child requests have been waited on. In order to fix that we
need to check the mapping->wb_err later nearer to the end of ceph_fsync.

We also have an overly-complex method for tracking errors after
blocklisting. The errors recorded in cleanup_session_requests go to a
completely separate field in the inode, but we end up reporting them the
same way we would for any other error (in fsync).

There's no real benefit to tracking these errors in two different
places, since the only reporting mechanism for them is in fsync, and
we'd need to advance them both every time.

Given that, we can just remove i_meta_err, and convert the places that
used it to instead just use mapping->wb_err instead. That also fixes
the original problem by ensuring that we do a check_and_advance of the
wb_err at the end of the fsync op.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52864
Reported-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
4 years agoceph: skip existing superblocks that are blocklisted or shut down when mounting
Jeff Layton [Thu, 30 Sep 2021 12:33:13 +0000 (08:33 -0400)]
ceph: skip existing superblocks that are blocklisted or shut down when mounting

Currently when mounting, we may end up finding an existing superblock
that corresponds to a blocklisted MDS client. This means that the new
mount ends up being unusable.

If we've found an existing superblock with a client that is already
blocklisted, and the client is not configured to recover on its own,
fail the match. Ditto if the superblock has been forcibly unmounted.

While we're in here, also rename "other" to the more conventional "fsc".

Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1901499
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
4 years agocan: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path
Marc Kleine-Budde [Fri, 7 May 2021 09:18:39 +0000 (11:18 +0200)]
can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path

When the a large chunk of data send and the receiver does not send a
Flow Control frame back in time, the sendmsg() does not return a error
code, but the number of bytes sent corresponding to the size of the
packet.

If a timeout occurs the isotp_tx_timer_handler() is fired, sets
sk->sk_err and calls the sk->sk_error_report() function. It was
wrongly expected that the error would be propagated to user space in
every case. For isotp_sendmsg() blocking on wait_event_interruptible()
this is not the case.

This patch fixes the problem by checking if sk->sk_err is set and
returning the error to user space.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/hartkopp/can-isotp/issues/42
Link: https://github.com/hartkopp/can-isotp/pull/43
Link: https://lore.kernel.org/all/20210507091839.1366379-1-mkl@pengutronix.de
Cc: stable@vger.kernel.org
Reported-by: Sottas Guillaume (LMB) <Guillaume.Sottas@liebherr.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 years agomailmap: add Andrej Shadura
Andrej Shadura [Mon, 18 Oct 2021 22:16:22 +0000 (15:16 -0700)]
mailmap: add Andrej Shadura

Add a mapping for my old work email for BelDisplayTech to the personal
email, and make sure the Collabora email has the correct spelling of the
first name.

Link: https://lkml.kernel.org/r/20210917091016.30232-1-andrew.shadura@collabora.co.uk
Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/thp: decrease nr_thps in file's mapping on THP split
Marek Szyprowski [Mon, 18 Oct 2021 22:16:19 +0000 (15:16 -0700)]
mm/thp: decrease nr_thps in file's mapping on THP split

Decrease nr_thps counter in file's mapping to ensure that the page cache
won't be dropped excessively on file write access if page has been
already split.

I've tried a test scenario running a big binary, kernel remaps it with
THPs, then force a THP split with /sys/kernel/debug/split_huge_pages.
During any further open of that binary with O_RDWR or O_WRITEONLY kernel
drops page cache for it, because of non-zero thps counter.

Link: https://lkml.kernel.org/r/20211012120237.2600-1-m.szyprowski@samsung.com
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache")
Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()")
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: <sfoon.kim@samsung.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()
Sean Christopherson [Mon, 18 Oct 2021 22:16:16 +0000 (15:16 -0700)]
mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()

Check for a NULL page->mapping before dereferencing the mapping in
page_is_secretmem(), as the page's mapping can be nullified while gup()
is running, e.g.  by reclaim or truncation.

  BUG: kernel NULL pointer dereference, address: 0000000000000068
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 6 PID: 4173897 Comm: CPU 3/KVM Tainted: G        W
  RIP: 0010:internal_get_user_pages_fast+0x621/0x9d0
  Code: <48> 81 7a 68 80 08 04 bc 0f 85 21 ff ff 8 89 c7 be
  RSP: 0018:ffffaa90087679b0 EFLAGS: 00010046
  RAX: ffffe3f37905b900 RBX: 00007f2dd561e000 RCX: ffffe3f37905b934
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe3f37905b900
  ...
  CR2: 0000000000000068 CR3: 00000004c5898003 CR4: 00000000001726e0
  Call Trace:
   get_user_pages_fast_only+0x13/0x20
   hva_to_pfn+0xa9/0x3e0
   try_async_pf+0xa1/0x270
   direct_page_fault+0x113/0xad0
   kvm_mmu_page_fault+0x69/0x680
   vmx_handle_exit+0xe1/0x5d0
   kvm_arch_vcpu_ioctl_run+0xd81/0x1c70
   kvm_vcpu_ioctl+0x267/0x670
   __x64_sys_ioctl+0x83/0xa0
   do_syscall_64+0x56/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Link: https://lkml.kernel.org/r/20211007231502.3552715-1-seanjc@google.com
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reported-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Stephen <stephenackerman16@gmail.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>