]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
3 years agoirqchip/gic-v3-its: Skip HP notifier when no ITS is registered
Marc Zyngier [Wed, 2 Feb 2022 10:34:54 +0000 (10:34 +0000)]
irqchip/gic-v3-its: Skip HP notifier when no ITS is registered

We have some systems out there that have both LPI support and an
ITS, but that don't expose the ITS in their firmware tables
(either because it is broken or because they run under a hypervisor
that hides it...).

Is such a configuration, we still register the HP notifier to free
the allocated tables if needed, resulting in a warning as there is
no memory to free (nothing was allocated the first place).

Fix it by keying the HP notifier on the presence of at least one
sucessfully probed ITS.

Fixes: d23bc2bc1d63 ("irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve")
Reported-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20220202103454.2480465-1-maz@kernel.org
3 years agoKVM: s390: Return error on SIDA memop on normal guest
Janis Schoetterl-Glausch [Fri, 28 Jan 2022 14:06:43 +0000 (15:06 +0100)]
KVM: s390: Return error on SIDA memop on normal guest

Refuse SIDA memops on guests which are not protected.
For normal guests, the secure instruction data address designation,
which determines the location we access, is not under control of KVM.

Fixes: 19e122776886 (KVM: S390: protvirt: Introduce instruction data area bounce buffer)
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
3 years agonvme-rdma: fix possible use-after-free in transport error_recovery work
Sagi Grimberg [Tue, 1 Feb 2022 12:54:21 +0000 (14:54 +0200)]
nvme-rdma: fix possible use-after-free in transport error_recovery work

While nvme_rdma_submit_async_event_work is checking the ctrl and queue
state before preparing the AER command and scheduling io_work, in order
to fully prevent a race where this check is not reliable the error
recovery work must flush async_event_work before continuing to destroy
the admin queue after setting the ctrl state to RESETTING such that
there is no race .submit_async_event and the error recovery handler
itself changing the ctrl state.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
3 years agonvme-tcp: fix possible use-after-free in transport error_recovery work
Sagi Grimberg [Tue, 1 Feb 2022 12:54:20 +0000 (14:54 +0200)]
nvme-tcp: fix possible use-after-free in transport error_recovery work

While nvme_tcp_submit_async_event_work is checking the ctrl and queue
state before preparing the AER command and scheduling io_work, in order
to fully prevent a race where this check is not reliable the error
recovery work must flush async_event_work before continuing to destroy
the admin queue after setting the ctrl state to RESETTING such that
there is no race .submit_async_event and the error recovery handler
itself changing the ctrl state.

Tested-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
3 years agonvme: fix a possible use-after-free in controller reset during load
Sagi Grimberg [Tue, 1 Feb 2022 12:54:19 +0000 (14:54 +0200)]
nvme: fix a possible use-after-free in controller reset during load

Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
readiness for AER submission. This may lead to a use-after-free
condition that was observed with nvme-tcp.

The race condition may happen in the following scenario:
1. driver executes its reset_ctrl_work
2. -> nvme_stop_ctrl - flushes ctrl async_event_work
3. ctrl sends AEN which is received by the host, which in turn
   schedules AEN handling
4. teardown admin queue (which releases the queue socket)
5. AEN processed, submits another AER, calling the driver to submit
6. driver attempts to send the cmd
==> use-after-free

In order to fix that, add ctrl state check to validate the ctrl
is actually able to accept the AER submission.

This addresses the above race in controller resets because the driver
during teardown should:
1. change ctrl state to RESETTING
2. flush async_event_work (as well as other async work elements)

So after 1,2, any other AER command will find the
ctrl state to be RESETTING and bail out without submitting the AER.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
3 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Wed, 2 Feb 2022 05:03:15 +0000 (21:03 -0800)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-02-01

This series contains updates to e1000e driver only.

Sasha removes CSME handshake with TGL platform as this is not supported
and is causing hardware unit hangs to be reported.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000e: Handshake with CSME starts from ADL platforms
  e1000e: Separate ADP board type from TGP
====================

Link: https://lore.kernel.org/r/20220201173754.580305-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agophy: dphy: Correct clk_pre parameter
Liu Ying [Mon, 24 Jan 2022 02:40:07 +0000 (10:40 +0800)]
phy: dphy: Correct clk_pre parameter

The D-PHY specification (v1.2) explicitly mentions that the T-CLK-PRE
parameter's unit is Unit Interval(UI) and the minimum value is 8.  Also,
kernel doc of the 'clk_pre' member of struct phy_configure_opts_mipi_dphy
mentions that it should be in UI.  However, the dphy core driver wrongly
sets 'clk_pre' to 8000, which seems to hint that it's in picoseconds.

So, let's fix the dphy core driver to correctly reflect the T-CLK-PRE
parameter's minimum value according to the D-PHY specification.

I'm assuming that all impacted custom drivers shall program values in
TxByteClkHS cycles into hardware for the T-CLK-PRE parameter.  The D-PHY
specification mentions that the frequency of TxByteClkHS is exactly 1/8
the High-Speed(HS) bit rate(each HS bit consumes one UI).  So, relevant
custom driver code is changed to program those values as
DIV_ROUND_UP(cfg->clk_pre, BITS_PER_BYTE), then.

Note that I've only tested the patch with RM67191 DSI panel on i.MX8mq EVK.
Help is needed to test with other i.MX8mq, Meson and Rockchip platforms,
as I don't have the hardwares.

Fixes: 2ed869990e14 ("phy: Add MIPI D-PHY configuration options")
Tested-by: Liu Ying <victor.liu@nxp.com> # RM67191 DSI panel on i.MX8mq EVK
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> # for phy-meson-axg-mipi-dphy.c
Tested-by: Neil Armstrong <narmstrong@baylibre.com> # for phy-meson-axg-mipi-dphy.c
Tested-by: Guido Günther <agx@sigxcpu.org> # Librem 5 (imx8mq) with it's rather picky panel
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Link: https://lore.kernel.org/r/20220124024007.1465018-1-victor.liu@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
3 years agonet/mlx5e: Avoid field-overflowing memcpy()
Kees Cook [Mon, 24 Jan 2022 17:20:28 +0000 (09:20 -0800)]
net/mlx5e: Avoid field-overflowing memcpy()

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.

Use flexible arrays instead of zero-element arrays (which look like they
are always overflowing) and split the cross-field memcpy() into two halves
that can be appropriately bounds-checked by the compiler.

We were doing:

#define ETH_HLEN  14
#define VLAN_HLEN  4
...
#define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
...
        struct mlx5e_tx_wqe      *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
...
        struct mlx5_wqe_eth_seg  *eseg = &wqe->eth;
        struct mlx5_wqe_data_seg *dseg = wqe->data;
...
memcpy(eseg->inline_hdr.start, xdptxd->data, MLX5E_XDP_MIN_INLINE);

target is wqe->eth.inline_hdr.start (which the compiler sees as being
2 bytes in size), but copying 18, intending to write across start
(really vlan_tci, 2 bytes). The remaining 16 bytes get written into
wqe->data[0], covering byte_count (4 bytes), lkey (4 bytes), and addr
(8 bytes).

struct mlx5e_tx_wqe {
        struct mlx5_wqe_ctrl_seg   ctrl;                 /*     0    16 */
        struct mlx5_wqe_eth_seg    eth;                  /*    16    16 */
        struct mlx5_wqe_data_seg   data[];               /*    32     0 */

        /* size: 32, cachelines: 1, members: 3 */
        /* last cacheline: 32 bytes */
};

struct mlx5_wqe_eth_seg {
        u8                         swp_outer_l4_offset;  /*     0     1 */
        u8                         swp_outer_l3_offset;  /*     1     1 */
        u8                         swp_inner_l4_offset;  /*     2     1 */
        u8                         swp_inner_l3_offset;  /*     3     1 */
        u8                         cs_flags;             /*     4     1 */
        u8                         swp_flags;            /*     5     1 */
        __be16                     mss;                  /*     6     2 */
        __be32                     flow_table_metadata;  /*     8     4 */
        union {
                struct {
                        __be16     sz;                   /*    12     2 */
                        u8         start[2];             /*    14     2 */
                } inline_hdr;                            /*    12     4 */
                struct {
                        __be16     type;                 /*    12     2 */
                        __be16     vlan_tci;             /*    14     2 */
                } insert;                                /*    12     4 */
                __be32             trailer;              /*    12     4 */
        };                                               /*    12     4 */

        /* size: 16, cachelines: 1, members: 9 */
        /* last cacheline: 16 bytes */
};

struct mlx5_wqe_data_seg {
        __be32                     byte_count;           /*     0     4 */
        __be32                     lkey;                 /*     4     4 */
        __be64                     addr;                 /*     8     8 */

        /* size: 16, cachelines: 1, members: 3 */
        /* last cacheline: 16 bytes */
};

So, split the memcpy() so the compiler can reason about the buffer
sizes.

"pahole" shows no size nor member offset changes to struct mlx5e_tx_wqe
nor struct mlx5e_umr_wqe. "objdump -d" shows no meaningful object
code changes (i.e. only source line number induced differences and
optimizations).

Fixes: b5503b994ed5 ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Use struct_group() for memcpy() region
Kees Cook [Mon, 24 Jan 2022 17:22:41 +0000 (09:22 -0800)]
net/mlx5e: Use struct_group() for memcpy() region

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.

Use struct_group() in struct vlan_ethhdr around members h_dest and
h_source, so they can be referenced together. This will allow memcpy()
and sizeof() to more easily reason about sizes, improve readability,
and avoid future warnings about writing beyond the end of h_dest.

"pahole" shows no size nor member offset changes to struct vlan_ethhdr.
"objdump -d" shows no object code changes.

Fixes: 34802a42b352 ("net/mlx5e: Do not modify the TX SKB")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Avoid implicit modify hdr for decap drop rule
Roi Dayan [Tue, 1 Feb 2022 13:27:48 +0000 (15:27 +0200)]
net/mlx5e: Avoid implicit modify hdr for decap drop rule

Currently the driver adds implicit modify hdr action for
decap rules on tunnel devices if the port is an ovs port.
This is also done if the action is drop and makes the modify
hdr redundant and also the FW doesn't support it and will generate
a syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fix it by adding the implicit modify hdr only for fwd actions.

Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Fixes: 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
Raed Salem [Thu, 2 Dec 2021 15:49:01 +0000 (17:49 +0200)]
net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic

IPsec Tunnel mode crypto offload software parser (SWP) setting in data
path currently always set the inner L4 offset regardless of the
encapsulated L4 header type and whether it exists in the first place,
this breaks non TCP/UDP traffic as such.

Set the SWP inner L4 offset only when the IPsec tunnel encapsulated L4
header protocol is TCP/UDP.

While at it fix inner ip protocol read for setting MLX5_ETH_WQE_SWP_INNER_L4_UDP
flag to address the case where the ip header protocol is IPv6.

Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
Raed Salem [Thu, 2 Dec 2021 15:43:50 +0000 (17:43 +0200)]
net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic

IPsec crypto offload always set the ethernet segment checksum flags with
the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
packet regardless of the encapsulated L4 header type, and even if it
doesn't exists in the first place, this breaks non TCP/UDP traffic as
such.

Set the inner L4 checksum flag only when the encapsulated L4 header
protocol is TCP/UDP using software parser swp_inner_l4_offset field as
indication.

Fixes: 5cfb540ef27b ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Don't treat small ceil values as unlimited in HTB offload
Maxim Mikityanskiy [Tue, 18 Jan 2022 11:31:54 +0000 (13:31 +0200)]
net/mlx5e: Don't treat small ceil values as unlimited in HTB offload

The hardware spec defines max_average_bw == 0 as "unlimited bandwidth".
max_average_bw is calculated as `ceil / BYTES_IN_MBIT`, which can become
0 when ceil is small, leading to an undesired effect of having no
bandwidth limit.

This commit fixes it by rounding up small values of ceil to 1 Mbit/s.

Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-Switch, Fix uninitialized variable modact
Maor Dickman [Sun, 30 Jan 2022 14:00:41 +0000 (16:00 +0200)]
net/mlx5: E-Switch, Fix uninitialized variable modact

The variable modact is not initialized before used in command
modify header allocation which can cause command to fail.

Fix by initializing modact with zeros.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 8f1e0b97cc70 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix handling of wrong devices during bond netevent
Maor Dickman [Thu, 13 Jan 2022 13:11:42 +0000 (15:11 +0200)]
net/mlx5e: Fix handling of wrong devices during bond netevent

Current implementation of bond netevent handler only check if
the handled netdev is VF representor and it missing a check if
the VF representor is on the same phys device of the bond handling
the netevent.

Fix by adding the missing check and optimizing the check if
the netdev is VF representor so it will not access uninitialized
private data and crashes.

BUG: kernel NULL pointer dereference, address: 000000000000036c
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
Workqueue: eth3bond0 bond_mii_monitor [bonding]
RIP: 0010:mlx5e_is_uplink_rep+0xc/0x50 [mlx5_core]
RSP: 0018:ffff88812d69fd60 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8881cf800000 RCX: 0000000000000000
RDX: ffff88812d69fe10 RSI: 000000000000001b RDI: ffff8881cf800880
RBP: ffff8881cf800000 R08: 00000445cabccf2b R09: 0000000000000008
R10: 0000000000000004 R11: 0000000000000008 R12: ffff88812d69fe10
R13: 00000000fffffffe R14: ffff88820c0f9000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88846fb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000036c CR3: 0000000103d80006 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 mlx5e_eswitch_uplink_rep+0x31/0x40 [mlx5_core]
 mlx5e_rep_is_lag_netdev+0x94/0xc0 [mlx5_core]
 mlx5e_rep_esw_bond_netevent+0xeb/0x3d0 [mlx5_core]
 raw_notifier_call_chain+0x41/0x60
 call_netdevice_notifiers_info+0x34/0x80
 netdev_lower_state_changed+0x4e/0xa0
 bond_mii_monitor+0x56b/0x640 [bonding]
 process_one_work+0x1b9/0x390
 worker_thread+0x4d/0x3d0
 ? rescuer_thread+0x350/0x350
 kthread+0x124/0x150
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

Fixes: 7e51891a237f ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix broken SKB allocation in HW-GRO
Khalid Manaa [Wed, 26 Jan 2022 12:25:55 +0000 (14:25 +0200)]
net/mlx5e: Fix broken SKB allocation in HW-GRO

In case the HW doesn't perform header-data split, it will write the whole
packet into the data buffer in the WQ, in this case the SHAMPO CQE handler
couldn't use the header entry to build the SKB, instead it should allocate
a new memory to build the SKB using the function:
mlx5e_skb_from_cqe_mpwrq_nonlinear.

Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix wrong calculation of header index in HW_GRO
Khalid Manaa [Wed, 26 Jan 2022 12:14:58 +0000 (14:14 +0200)]
net/mlx5e: Fix wrong calculation of header index in HW_GRO

The HW doesn't wrap the CQE.shampo.header_index field according to the
headers buffer size, instead it always increases it until reaching overflow
of u16 size.

Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the
CQE header_index field to find the actual header index in the headers buffer.

Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Bridge, Fix devlink deadlock on net namespace deletion
Roi Dayan [Mon, 24 Jan 2022 11:56:26 +0000 (13:56 +0200)]
net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion

When changing mode to switchdev, rep bridge init registered to netdevice
notifier holds the devlink lock and then takes pernet_ops_rwsem.
At that time deleting a netns holds pernet_ops_rwsem and then takes
the devlink lock.

Example sequence is:
$ ip netns add foo
$ devlink dev eswitch set pci/0000:00:08.0 mode switchdev &
$ ip netns del foo

deleting netns trace:

[ 1185.365555]  ? devlink_pernet_pre_exit+0x74/0x1c0
[ 1185.368331]  ? mutex_lock_io_nested+0x13f0/0x13f0
[ 1185.370984]  ? xt_find_table+0x40/0x100
[ 1185.373244]  ? __mutex_lock+0x24a/0x15a0
[ 1185.375494]  ? net_generic+0xa0/0x1c0
[ 1185.376844]  ? wait_for_completion_io+0x280/0x280
[ 1185.377767]  ? devlink_pernet_pre_exit+0x74/0x1c0
[ 1185.378686]  devlink_pernet_pre_exit+0x74/0x1c0
[ 1185.379579]  ? devlink_nl_cmd_get_dumpit+0x3a0/0x3a0
[ 1185.380557]  ? xt_find_table+0xda/0x100
[ 1185.381367]  cleanup_net+0x372/0x8e0

changing mode to switchdev trace:

[ 1185.411267]  down_write+0x13a/0x150
[ 1185.412029]  ? down_write_killable+0x180/0x180
[ 1185.413005]  register_netdevice_notifier+0x1e/0x210
[ 1185.414000]  mlx5e_rep_bridge_init+0x181/0x360 [mlx5_core]
[ 1185.415243]  mlx5e_uplink_rep_enable+0x269/0x480 [mlx5_core]
[ 1185.416464]  ? mlx5e_uplink_rep_disable+0x210/0x210 [mlx5_core]
[ 1185.417749]  mlx5e_attach_netdev+0x232/0x400 [mlx5_core]
[ 1185.418906]  mlx5e_netdev_attach_profile+0x15b/0x1e0 [mlx5_core]
[ 1185.420172]  mlx5e_netdev_change_profile+0x15a/0x1d0 [mlx5_core]
[ 1185.421459]  mlx5e_vport_rep_load+0x557/0x780 [mlx5_core]
[ 1185.422624]  ? mlx5e_stats_grp_vport_rep_num_stats+0x10/0x10 [mlx5_core]
[ 1185.424006]  mlx5_esw_offloads_rep_load+0xdb/0x190 [mlx5_core]
[ 1185.425277]  esw_offloads_enable+0xd74/0x14a0 [mlx5_core]

Fix this by registering rep bridges for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 19e9bfa044f3 ("net/mlx5: Bridge, add offload infrastructure")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
Dima Chumak [Mon, 17 Jan 2022 13:32:16 +0000 (15:32 +0200)]
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE

Only prio 1 is supported for nic mode when there is no ignore flow level
support in firmware. But for switchdev mode, which supports fixed number
of statically pre-allocated prios, this restriction is not relevant so
it can be relaxed.

Fixes: d671e109bd85 ("net/mlx5: Fix tc max supported prio for nic mode")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Reject rules with forward and drop actions
Roi Dayan [Mon, 17 Jan 2022 13:00:30 +0000 (15:00 +0200)]
net/mlx5e: TC, Reject rules with forward and drop actions

Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.

Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Use del_timer_sync in fw reset flow of halting poll
Maher Sanalla [Thu, 13 Jan 2022 13:48:48 +0000 (15:48 +0200)]
net/mlx5: Use del_timer_sync in fw reset flow of halting poll

Substitute del_timer() with del_timer_sync() in fw reset polling
deactivation flow, in order to prevent a race condition which occurs
when del_timer() is called and timer is deactivated while another
process is handling the timer interrupt. A situation that led to
the following call trace:
RIP: 0010:run_timer_softirq+0x137/0x420
<IRQ>
recalibrate_cpu_khz+0x10/0x10
ktime_get+0x3e/0xa0
? sched_clock_cpu+0xb/0xc0
__do_softirq+0xf5/0x2ea
irq_exit_rcu+0xc1/0xf0
sysvec_apic_timer_interrupt+0x9e/0xc0
asm_sysvec_apic_timer_interrupt+0x12/0x20
</IRQ>

Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix module EEPROM query
Gal Pressman [Sun, 16 Jan 2022 07:07:22 +0000 (09:07 +0200)]
net/mlx5e: Fix module EEPROM query

When querying the module EEPROM, there was a misusage of the 'offset'
variable vs the 'query.offset' field.
Fix that by always using 'offset' and assigning its value to
'query.offset' right before the mcia register read call.

While at it, the cross-pages read size adjustment was changed to be more
intuitive.

Fixes: e19b0a3474ab ("net/mlx5: Refactor module EEPROM query")
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Reject rules with drop and modify hdr action
Roi Dayan [Tue, 4 Jan 2022 08:38:02 +0000 (10:38 +0200)]
net/mlx5e: TC, Reject rules with drop and modify hdr action

This kind of action is not supported by firmware and generates a
syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Bridge, ensure dev_name is null-terminated
Vlad Buslov [Thu, 6 Jan 2022 16:45:26 +0000 (18:45 +0200)]
net/mlx5: Bridge, ensure dev_name is null-terminated

Even though net_device->name is guaranteed to be null-terminated string of
size<=IFNAMSIZ, the test robot complains that return value of netdev_name()
can be larger:

In file included from include/trace/define_trace.h:102,
                    from drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:113,
                    from drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c:12:
   drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h: In function 'trace_event_raw_event_mlx5_esw_bridge_fdb_template':
>> drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:24:29: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
      24 |                             strncpy(__entry->dev_name,
         |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~
      25 |                                     netdev_name(fdb->dev),
         |                                     ~~~~~~~~~~~~~~~~~~~~~~
      26 |                                     IFNAMSIZ);
         |                                     ~~~~~~~~~

This is caused by the fact that default value of IFNAMSIZ is 16, while
placeholder value that is returned by netdev_name() for unnamed net devices
is larger than that.

The offending code is in a tracing function that is only called for mlx5
representors, so there is no straightforward way to reproduce the issue but
let's fix it for correctness sake by replacing strncpy() with strscpy() to
ensure that resulting string is always null-terminated.

Fixes: 9724fd5d9c2a ("net/mlx5: Bridge, add tracepoints")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Bridge, take rtnl lock in init error handler
Vlad Buslov [Thu, 6 Jan 2022 14:40:18 +0000 (16:40 +0200)]
net/mlx5: Bridge, take rtnl lock in init error handler

The mlx5_esw_bridge_cleanup() is expected to be called with rtnl lock
taken, which is true for mlx5e_rep_bridge_cleanup() function but not for
error handling code in mlx5e_rep_bridge_init(). Add missing rtnl
lock/unlock calls and extend both mlx5_esw_bridge_cleanup() and its dual
function mlx5_esw_bridge_init() with ASSERT_RTNL() to verify the invariant
from now on.

Fixes: 7cd6a54a8285 ("net/mlx5: Bridge, handle FDB events")
Fixes: 19e9bfa044f3 ("net/mlx5: Bridge, add offload infrastructure")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Wed, 2 Feb 2022 04:39:46 +0000 (20:39 -0800)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-01-31

This series contains updates to i40e driver only.

Jedrzej fixes a condition check which would cause an error when
resetting bandwidth when DCB is active with one TC.

Karen resolves a null pointer dereference that could occur when removing
the driver while VSI rings are being disabled.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  i40e: Fix reset path while removing the driver
  i40e: Fix reset bw limit when DCB enabled with 1 TC
====================

Link: https://lore.kernel.org/r/20220201000522.505909-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: macsec: Verify that send_sci is on when setting Tx sci explicitly
Lior Nahmanson [Sun, 30 Jan 2022 11:37:52 +0000 (13:37 +0200)]
net: macsec: Verify that send_sci is on when setting Tx sci explicitly

When setting Tx sci explicit, the Rx side is expected to use this
sci and not recalculate it from the packet.However, in case of Tx sci
is explicit and send_sci is off, the receiver is wrongly recalculate
the sci from the source MAC address which most likely be different
than the explicit sci.

Fix by preventing such configuration when macsec newlink is established
and return EINVAL error code on such cases.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Link: https://lore.kernel.org/r/1643542672-29403-1-git-send-email-raeds@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
Georgi Valkov [Tue, 1 Feb 2022 07:16:18 +0000 (08:16 +0100)]
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback

When rx_buf is allocated we need to account for IPHETH_IP_ALIGN,
which reduces the usable size by 2 bytes. Otherwise we have 1512
bytes usable instead of 1514, and if we receive more than 1512
bytes, ipheth_rcvbulk_callback is called with status -EOVERFLOW,
after which the driver malfunctiones and all communication stops.

Resolves ipheth 2-1:4.2: ipheth_rcvbulk_callback: urb status: -75

Fixes: f33d9e2b48a3 ("usbnet: ipheth: fix connectivity with iOS 14")
Signed-off-by: Georgi Valkov <gvalkov@abv.bg>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Link: https://lore.kernel.org/all/B60B8A4B-92A0-49B3-805D-809A2433B46C@abv.bg/
Link: https://lore.kernel.org/all/24851bd2769434a5fc24730dce8e8a984c5a4505.1643699778.git.jan.kiszka@siemens.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotcp: fix mem under-charging with zerocopy sendmsg()
Eric Dumazet [Tue, 1 Feb 2022 06:52:54 +0000 (22:52 -0800)]
tcp: fix mem under-charging with zerocopy sendmsg()

We got reports of following warning in inet_sock_destruct()

WARN_ON(sk_forward_alloc_get(sk));

Whenever we add a non zero-copy fragment to a pure zerocopy skb,
we have to anticipate that whole skb->truesize will be uncharged
when skb is finally freed.

skb->data_len is the payload length. But the memory truesize
estimated by __zerocopy_sg_from_iter() is page aligned.

Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20220201065254.680532-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoaf_packet: fix data-race in packet_setsockopt / packet_setsockopt
Eric Dumazet [Tue, 1 Feb 2022 02:23:58 +0000 (18:23 -0800)]
af_packet: fix data-race in packet_setsockopt / packet_setsockopt

When packet_setsockopt( PACKET_FANOUT_DATA ) reads po->fanout,
no lock is held, meaning that another thread can change po->fanout.

Given that po->fanout can only be set once during the socket lifetime
(it is only cleared from fanout_release()), we can use
READ_ONCE()/WRITE_ONCE() to document the race.

BUG: KCSAN: data-race in packet_setsockopt / packet_setsockopt

write to 0xffff88813ae8e300 of 8 bytes by task 14653 on cpu 0:
 fanout_add net/packet/af_packet.c:1791 [inline]
 packet_setsockopt+0x22fe/0x24a0 net/packet/af_packet.c:3931
 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
 __do_sys_setsockopt net/socket.c:2191 [inline]
 __se_sys_setsockopt net/socket.c:2188 [inline]
 __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff88813ae8e300 of 8 bytes by task 14654 on cpu 1:
 packet_setsockopt+0x691/0x24a0 net/packet/af_packet.c:3935
 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
 __do_sys_setsockopt net/socket.c:2191 [inline]
 __se_sys_setsockopt net/socket.c:2188 [inline]
 __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000000000000000 -> 0xffff888106f8c000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 14654 Comm: syz-executor.3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 47dceb8ecdc1 ("packet: add classic BPF fanout mode")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220201022358.330621-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agortnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
Eric Dumazet [Tue, 1 Feb 2022 01:21:06 +0000 (17:21 -0800)]
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()

While looking at one unrelated syzbot bug, I found the replay logic
in __rtnl_newlink() to potentially trigger use-after-free.

It is better to clear master_dev and m_ops inside the loop,
in case we have to replay it.

Fixes: ba7d49b1f0f8 ("rtnetlink: provide api for getting and setting slave info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220201012106.216495-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: sched: fix use-after-free in tc_new_tfilter()
Eric Dumazet [Mon, 31 Jan 2022 17:20:18 +0000 (09:20 -0800)]
net: sched: fix use-after-free in tc_new_tfilter()

Whenever tc_new_tfilter() jumps back to replay: label,
we need to make sure @q and @chain local variables are cleared again,
or risk use-after-free as in [1]

For consistency, apply the same fix in tc_ctl_chain()

BUG: KASAN: use-after-free in mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
Write of size 8 at addr ffff8880985c4b08 by task syz-executor.4/1945

CPU: 0 PID: 1945 Comm: syz-executor.4 Not tainted 5.17.0-rc1-syzkaller-00495-gff58831fa02d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
 mini_qdisc_pair_swap+0x1b9/0x1f0 net/sched/sch_generic.c:1581
 tcf_chain_head_change_item net/sched/cls_api.c:372 [inline]
 tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:386
 tcf_chain_tp_insert net/sched/cls_api.c:1657 [inline]
 tcf_chain_tp_insert_unique net/sched/cls_api.c:1707 [inline]
 tc_new_tfilter+0x1e67/0x2350 net/sched/cls_api.c:2086
 rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x331/0x810 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmmsg+0x195/0x470 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2647172059
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f2645aa5168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f2647285100 RCX: 00007f2647172059
RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000006
RBP: 00007f26471cc08d R08: 0000000000000000 R09: 0000000000000000
R10: 9e00000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fffb3f7f02f R14: 00007f2645aa5300 R15: 0000000000022000
 </TASK>

Allocated by task 1944:
 kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:45 [inline]
 set_alloc_info mm/kasan/common.c:436 [inline]
 ____kasan_kmalloc mm/kasan/common.c:515 [inline]
 ____kasan_kmalloc mm/kasan/common.c:474 [inline]
 __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:524
 kmalloc_node include/linux/slab.h:604 [inline]
 kzalloc_node include/linux/slab.h:726 [inline]
 qdisc_alloc+0xac/0xa10 net/sched/sch_generic.c:941
 qdisc_create.constprop.0+0xce/0x10f0 net/sched/sch_api.c:1211
 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5592
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x331/0x810 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmmsg+0x195/0x470 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 3609:
 kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
 kasan_set_track+0x21/0x30 mm/kasan/common.c:45
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free+0x130/0x160 mm/kasan/common.c:328
 kasan_slab_free include/linux/kasan.h:236 [inline]
 slab_free_hook mm/slub.c:1728 [inline]
 slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1754
 slab_free mm/slub.c:3509 [inline]
 kfree+0xcb/0x280 mm/slub.c:4562
 rcu_do_batch kernel/rcu/tree.c:2527 [inline]
 rcu_core+0x7b8/0x1540 kernel/rcu/tree.c:2778
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Last potentially related work creation:
 kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
 __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
 __call_rcu kernel/rcu/tree.c:3026 [inline]
 call_rcu+0xb1/0x740 kernel/rcu/tree.c:3106
 qdisc_put_unlocked+0x6f/0x90 net/sched/sch_generic.c:1109
 tcf_block_release+0x86/0x90 net/sched/cls_api.c:1238
 tc_new_tfilter+0xc0d/0x2350 net/sched/cls_api.c:2148
 rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:5583
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x331/0x810 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmmsg+0x195/0x470 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff8880985c4800
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 776 bytes inside of
 1024-byte region [ffff8880985c4800ffff8880985c4c00)
The buggy address belongs to the page:
page:ffffea0002617000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985c0
head:ffffea0002617000 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888010c41dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 1941, ts 1038999441284, free_ts 1033444432829
 prep_new_page mm/page_alloc.c:2434 [inline]
 get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
 alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271
 alloc_slab_page mm/slub.c:1799 [inline]
 allocate_slab mm/slub.c:1944 [inline]
 new_slab+0x28a/0x3b0 mm/slub.c:2004
 ___slab_alloc+0x87c/0xe90 mm/slub.c:3018
 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105
 slab_alloc_node mm/slub.c:3196 [inline]
 slab_alloc mm/slub.c:3238 [inline]
 __kmalloc+0x2fb/0x340 mm/slub.c:4420
 kmalloc include/linux/slab.h:586 [inline]
 kzalloc include/linux/slab.h:715 [inline]
 __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1335
 neigh_sysctl_register+0x2c8/0x5e0 net/core/neighbour.c:3787
 devinet_sysctl_register+0xb1/0x230 net/ipv4/devinet.c:2618
 inetdev_init+0x286/0x580 net/ipv4/devinet.c:278
 inetdev_event+0xa8a/0x15d0 net/ipv4/devinet.c:1532
 notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1919
 call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
 call_netdevice_notifiers net/core/dev.c:1945 [inline]
 register_netdevice+0x1073/0x1500 net/core/dev.c:9698
 veth_newlink+0x59c/0xa90 drivers/net/veth.c:1722
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1352 [inline]
 free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
 free_unref_page_prepare mm/page_alloc.c:3325 [inline]
 free_unref_page+0x19/0x690 mm/page_alloc.c:3404
 release_pages+0x748/0x1220 mm/swap.c:956
 tlb_batch_pages_flush mm/mmu_gather.c:50 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:243 [inline]
 tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:250
 zap_pte_range mm/memory.c:1441 [inline]
 zap_pmd_range mm/memory.c:1490 [inline]
 zap_pud_range mm/memory.c:1519 [inline]
 zap_p4d_range mm/memory.c:1540 [inline]
 unmap_page_range+0x1d1d/0x2a30 mm/memory.c:1561
 unmap_single_vma+0x198/0x310 mm/memory.c:1606
 unmap_vmas+0x16b/0x2f0 mm/memory.c:1638
 exit_mmap+0x201/0x670 mm/mmap.c:3178
 __mmput+0x122/0x4b0 kernel/fork.c:1114
 mmput+0x56/0x60 kernel/fork.c:1135
 exit_mm kernel/exit.c:507 [inline]
 do_exit+0xa3c/0x2a30 kernel/exit.c:793
 do_group_exit+0xd2/0x2f0 kernel/exit.c:935
 __do_sys_exit_group kernel/exit.c:946 [inline]
 __se_sys_exit_group kernel/exit.c:944 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Memory state around the buggy address:
 ffff8880985c4a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880985c4a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880985c4b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
 ffff8880985c4b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880985c4c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: 470502de5bdb ("net: sched: unlock rules update API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220131172018.3704490-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoethernet: smc911x: fix indentation in get/set EEPROM
Jakub Kicinski [Mon, 31 Jan 2022 21:17:30 +0000 (13:17 -0800)]
ethernet: smc911x: fix indentation in get/set EEPROM

Build bot produced a smatch indentation warning,
the code looks correct but it mixes spaces and tabs.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20220131211730.3940875-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoxfs: ensure log flush at the end of a synchronous fallocate call
Dave Chinner [Mon, 31 Jan 2022 21:20:10 +0000 (13:20 -0800)]
xfs: ensure log flush at the end of a synchronous fallocate call

Since we've started treating fallocate more like a file write, we
should flush the log to disk if the user has asked for synchronous
writes either by setting it via fcntl flags, or inode flags, or with
the sync mount option.  We've already got a helper for this, so use
it.

[The original patch by Darrick was massaged by Dave to fit this patchset]

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
3 years agoxfs: move xfs_update_prealloc_flags() to xfs_pnfs.c
Dave Chinner [Mon, 31 Jan 2022 21:20:10 +0000 (13:20 -0800)]
xfs: move xfs_update_prealloc_flags() to xfs_pnfs.c

The operations that xfs_update_prealloc_flags() perform are now
unique to xfs_fs_map_blocks(), so move xfs_update_prealloc_flags()
to be a static function in xfs_pnfs.c and cut out all the
other functionality that is doesn't use anymore.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
3 years agoxfs: set prealloc flag in xfs_alloc_file_space()
Dave Chinner [Mon, 31 Jan 2022 21:20:09 +0000 (13:20 -0800)]
xfs: set prealloc flag in xfs_alloc_file_space()

Now that we only call xfs_update_prealloc_flags() from
xfs_file_fallocate() in the case where we need to set the
preallocation flag, do this in xfs_alloc_file_space() where we
already have the inode joined into a transaction and get
rid of the call to xfs_update_prealloc_flags() from the fallocate
code.

This also means that we now correctly avoid setting the
XFS_DIFLAG_PREALLOC flag when xfs_is_always_cow_inode() is true, as
these inodes will never have preallocated extents.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
3 years agoxfs: fallocate() should call file_modified()
Dave Chinner [Mon, 31 Jan 2022 21:20:09 +0000 (13:20 -0800)]
xfs: fallocate() should call file_modified()

In XFS, we always update the inode change and modification time when
any fallocate() operation succeeds.  Furthermore, as various
fallocate modes can change the file contents (extending EOF,
punching holes, zeroing things, shifting extents), we should drop
file privileges like suid just like we do for a regular write().
There's already a VFS helper that figures all this out for us, so
use that.

The net effect of this is that we no longer drop suid/sgid if the
caller is root, but we also now drop file capabilities.

We also move the xfs_update_prealloc_flags() function so that it now
is only called by the scope that needs to set the the prealloc flag.

Based on a patch from Darrick Wong.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
3 years agoxfs: remove XFS_PREALLOC_SYNC
Dave Chinner [Mon, 31 Jan 2022 21:20:08 +0000 (13:20 -0800)]
xfs: remove XFS_PREALLOC_SYNC

Callers can acheive the same thing by calling xfs_log_force_inode()
after making their modifications. There is no need for
xfs_update_prealloc_flags() to do this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
3 years agotools: Ignore errors from `which' when searching a GCC toolchain
Jean-Philippe Brucker [Tue, 1 Feb 2022 09:31:20 +0000 (09:31 +0000)]
tools: Ignore errors from `which' when searching a GCC toolchain

When cross-building tools with clang, we run `which $(CROSS_COMPILE)gcc`
to detect whether a GCC toolchain provides the standard libraries. It is
only a helper because some distros put libraries where LLVM does not
automatically find them. On other systems, LLVM detects the libc
automatically and does not need this. There, it is completely fine not
to have a GCC at all, but some versions of `which' display an error when
the command is not found:

  which: no aarch64-linux-gnu-gcc in ($PATH)

Since the error can safely be ignored, throw it to /dev/null.

Fixes: cebdb7374577 ("tools: Help cross-building with clang")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/bpf/20220201093119.1713207-1-jean-philippe@linaro.org
3 years agoMerge tag 'spi-fix-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Tue, 1 Feb 2022 20:39:37 +0000 (12:39 -0800)]
Merge tag 'spi-fix-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "There are quite a few fixes that have accumilated since the merge
  window here, all driver specific and none super urgent, plus a new
  device ID for the Rockchip driver"

* tag 'spi-fix-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: mediatek: Avoid NULL pointer crash in interrupt
  spi: dt-bindings: Fix 'reg' child node schema
  spi: bcm-qspi: check for valid cs before applying chip select
  spi: uniphier: fix reference count leak in uniphier_spi_probe()
  spi: meson-spicc: add IRQ check in meson_spicc_probe
  spi: uniphier: Fix a bug that doesn't point to private data correctly
  spi: change clk_disable_unprepare to clk_unprepare
  spi: spi-rockchip: Add rk3568-spi compatible
  spi: stm32: make SPI_MASTER_MUST_TX flags only specific to STM32F4
  spi: stm32: remove inexistant variables in struct stm32_spi_cfg comment
  spi: stm32-qspi: Update spi registering

3 years agoMerge tag 'regulator-fix-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 1 Feb 2022 20:37:20 +0000 (12:37 -0800)]
Merge tag 'regulator-fix-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A couple of very minor fixes for the regulator framework, nothing at
  all urgent here"

* tag 'regulator-fix-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: MAX20086: add gpio/consumer.h
  regulator: max20086: fix error code in max20086_parse_regulators_dt()

3 years agoMerge tag 'platform-drivers-x86-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 1 Feb 2022 20:12:10 +0000 (12:12 -0800)]
Merge tag 'platform-drivers-x86-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "This consists of various build- and bug-fixes as well as a few
  hardware-id additions.

  Highlights:
   - Bunch of fixes for the new x86-android-tablets module
   - Misc other fixes
   - A couple of hw-id additions"

* tag 'platform-drivers-x86-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: thinkpad_acpi: Fix incorrect use of platform profile on AMD platforms
  platform/x86: amd-pmc: Correct usage of SMU version
  platform/x86: asus-tf103c-dock: Make 2 global structs static
  platform/x86: amd-pmc: Make amd_pmc_stb_debugfs_fops static
  platform/x86: ISST: Fix possible circular locking dependency detected
  platform/x86: intel_crystal_cove_charger: Fix IRQ masking / unmasking
  platform/x86: thinkpad_acpi: Add quirk for ThinkPads without a fan
  platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1
  platform/surface: Reinstate platform dependency
  platform/x86: x86-android-tablets: Trivial typo fix for MODULE_AUTHOR
  platform/x86: x86-android-tablets: Fix the buttons on CZC P10T tablet
  platform/x86: x86-android-tablets: Constify the gpiod_lookup_tables arrays
  platform/x86: x86-android-tablets: Add an init() callback to struct x86_dev_info
  platform/x86: x86-android-tablets: Add support for disabling ACPI _AEI handlers
  platform/x86: x86-android-tablets: Correct crystal_cove_charger module name

3 years agoMerge tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszer...
Linus Torvalds [Tue, 1 Feb 2022 19:23:02 +0000 (11:23 -0800)]
Merge tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "Fix a regression introduced in v5.15, affecting copy up of files with
  'noatime' or 'sync' attributes to a tmpfs upper layer"

* tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: don't fail copy up if no fileattr support on upper
  ovl: fix NULL pointer dereference in copy up warning

3 years agomailmap: update Christian Brauner's email address
Christian Brauner [Mon, 31 Jan 2022 14:48:54 +0000 (15:48 +0100)]
mailmap: update Christian Brauner's email address

At least one of the addresses will stop functioning after February.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 1 Feb 2022 19:13:24 +0000 (11:13 -0800)]
Merge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode cleanup from Gabriel Krisman Bertazi:
 "A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
  the previous CONFIG_UNICODE. It is -rc material since we don't want to
  expose the former symbol on 5.17.

  This has been living on linux-next for the past week"

* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: clean up the Kconfig symbol confusion

3 years agoMerge tag 'audit-pr-20220131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoor...
Linus Torvalds [Tue, 1 Feb 2022 19:07:09 +0000 (11:07 -0800)]
Merge tag 'audit-pr-20220131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit fix from Paul Moore:
 "A single audit patch to fix problems relating to audit queuing and
  system responsiveness when "audit=1" is specified on the kernel
  command line and the audit daemon is SIGSTOP'd for an extended period
  of time"

* tag 'audit-pr-20220131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: improve audit queue handling when "audit=1" on cmdline

3 years agoarm64: Enable Cortex-A510 erratum 2051678 by default
Mark Brown [Tue, 1 Feb 2022 14:48:38 +0000 (14:48 +0000)]
arm64: Enable Cortex-A510 erratum 2051678 by default

The recently added configuration option for Cortex A510 erratum 2051678 does
not have a "default y" unlike other errata fixes. This appears to simply be
an oversight since the help text suggests enabling the option if unsure and
there's nothing in the commit log to suggest it is intentional.

Fixes: 297ae1eb23b0 ("arm64: cpufeature: List early Cortex-A510 parts as having broken dbm")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220201144838.20037-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agokvm/arm64: rework guest entry logic
Mark Rutland [Tue, 1 Feb 2022 13:29:23 +0000 (13:29 +0000)]
kvm/arm64: rework guest entry logic

In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state
(EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to
exiting the EQS by calling guest_exit(). As the IRQ entry code will not
wake RCU in this case, we may run the core IRQ code and IRQ handler
without RCU watching, leading to various potential problems.

Additionally, we do not inform lockdep or tracing that interrupts will
be enabled during guest execution, which caan lead to misleading traces
and warnings that interrupts have been enabled for overly-long periods.

This patch fixes these issues by using the new timing and context
entry/exit helpers to ensure that interrupts are handled during guest
vtime but with RCU watching, with a sequence:

guest_timing_enter_irqoff();

guest_state_enter_irqoff();
< run the vcpu >
guest_state_exit_irqoff();

< take any pending IRQs >

guest_timing_exit_irqoff();

Since instrumentation may make use of RCU, we must also ensure that no
instrumented code is run during the EQS. I've split out the critical
section into a new kvm_arm_enter_exit_vcpu() helper which is marked
noinstr.

Fixes: 1b3d546daf85ed2b ("arm/arm64: KVM: Properly account for guest CPU time")
Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Message-Id: <20220201132926.3301912-3-mark.rutland@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agocgroup-v1: Require capabilities to set release_agent
Eric W. Biederman [Thu, 20 Jan 2022 17:04:01 +0000 (11:04 -0600)]
cgroup-v1: Require capabilities to set release_agent

The cgroup release_agent is called with call_usermodehelper.  The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.

Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Fixes: 81a6a5cdd2c5 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable@vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
3 years agoPCI: j721e: Initialize pcie->cdns_pcie before using it
Bjorn Helgaas [Thu, 27 Jan 2022 21:49:49 +0000 (15:49 -0600)]
PCI: j721e: Initialize pcie->cdns_pcie before using it

Christian reported a NULL pointer dereference in j721e_pcie_probe() caused
by 19e863828acf ("PCI: j721e: Drop redundant struct device *"), which
removed struct j721e_pcie.dev since there's another copy in struct
cdns_pcie.dev reachable via j721e_pcie->cdns_pcie->dev.

The problem is that j721e_pcie->cdns_pcie was dereferenced before being
initialized:

  j721e_pcie_probe
    pcie = devm_kzalloc()             # struct j721e_pcie
    j721e_pcie_ctrl_init(pcie)
      dev = pcie->cdns_pcie->dev      <-- dereference cdns_pcie
    switch (mode) {
    case PCI_MODE_RC:
      cdns_pcie = ...                 # alloc as part of pci_host_bridge
      pcie->cdns_pcie = cdns_pcie     <-- initialize pcie->cdns_pcie

Move the cdns_pcie initialization earlier so it is done before it is used.
This also simplifies the error exits.

Fixes: 19e863828acf ("PCI: j721e: Drop redundant struct device *")
Link: https://lore.kernel.org/r/20220127222951.GA144828@bhelgaas
Link: https://lore.kernel.org/r/20220124122132.435743-1-christian.gmeiner@gmail.com
Reported-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
3 years agoe1000e: Handshake with CSME starts from ADL platforms
Sasha Neftin [Tue, 7 Dec 2021 11:23:42 +0000 (13:23 +0200)]
e1000e: Handshake with CSME starts from ADL platforms

Handshake with CSME/AMT on none provisioned platforms during S0ix flow
is not supported on TGL platform and can cause to HW unit hang. Update
the handshake with CSME flow to start from the ADL platform.

Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoe1000e: Separate ADP board type from TGP
Sasha Neftin [Tue, 7 Dec 2021 11:23:06 +0000 (13:23 +0200)]
e1000e: Separate ADP board type from TGP

We have the same LAN controller on different PCH's. Separate ADP board
type from a TGP which will allow for specific fixes to be applied for
ADP platforms.

Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agocifs: Fix the readahead conversion to manage the batch when reading from cache
David Howells [Mon, 31 Jan 2022 17:54:43 +0000 (17:54 +0000)]
cifs: Fix the readahead conversion to manage the batch when reading from cache

Fix the readahead conversion to correctly manage the last batch skipping
when reading from cache.  This involves a readahead batch of one page or
one folio, so set the batch size according to the number of constituent
pages (should be 1 for a filesystem that doesn't do multipage folios yet).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
Reviewed-by: Rohith Surabattula <rohiths.msft@gmail.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: Implement cache I/O by accessing the cache directly
David Howells [Thu, 27 Jan 2022 16:02:58 +0000 (16:02 +0000)]
cifs: Implement cache I/O by accessing the cache directly

Move cifs to using fscache DIO API instead of the old upstream I/O API as
that has been removed.  This is a stopgap solution as the intention is that
at sometime in the future, the cache will move to using larger blocks and
won't be able to store individual pages in order to deal with the potential
for data corruption due to the backing filesystem being able insert/remove
bridging blocks of zeros into its extent list[1].

cifs then reads and writes cache pages synchronously and one page at a time.

The preferred change would be to use the netfs lib, but the new I/O API can
be used directly.  It's just that as the cache now needs to track data for
itself, caching blocks may exceed page size...

This code is somewhat borrowed from my "fallback I/O" patchset[2].

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/YO17ZNOcq+9PajfQ@mit.edu
Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agonetfs, cachefiles: Add a method to query presence of data in the cache
David Howells [Thu, 27 Jan 2022 16:02:50 +0000 (16:02 +0000)]
netfs, cachefiles: Add a method to query presence of data in the cache

Add a netfs_cache_ops method by which a network filesystem can ask the
cache about what data it has available and where so that it can make a
multipage read more efficient.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: Transition from ->readpages() to ->readahead()
David Howells [Thu, 27 Jan 2022 16:02:42 +0000 (16:02 +0000)]
cifs: Transition from ->readpages() to ->readahead()

Transition the cifs filesystem from using the old ->readpages() method to
using the new ->readahead() method.

For the moment, this removes any invocation of fscache to read data from
the local cache, leaving that to another patch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agotools headers UAPI: Sync linux/prctl.h with the kernel sources
Arnaldo Carvalho de Melo [Thu, 11 Feb 2021 15:50:52 +0000 (12:50 -0300)]
tools headers UAPI: Sync linux/prctl.h with the kernel sources

To pick the changes in:

  9a10064f5625d557 ("mm: add a field to store names for private anonymous memory")

That don't result in any changes in tooling:

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  $

This actually adds a new prctl arg, but it has to be dealt with
differently, as it is not in sequence with the other arguments.

Just silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Colin Cross <ccross@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf beauty: Make the prctl arg regexp more strict to cope with PR_SET_VMA
Arnaldo Carvalho de Melo [Tue, 1 Feb 2022 15:53:16 +0000 (12:53 -0300)]
perf beauty: Make the prctl arg regexp more strict to cope with PR_SET_VMA

This new PR_SET_VMA value isn't in sequence with all the other prctl
arguments and instead uses a big, 0x prefixed hex number: 0x53564d41 (S V M A).

This makes it harder to generate a string table as it would be rather
sparse, so make the regexp more stricter to avoid catching those.

A followup patch for 'perf trace' to cope with such oddities will be
needed, but then its a matter for the next merge window.

The next patch will update the prctl.h file to cope with this perf build
warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Here is the output of this script:

  $ tools/perf/trace/beauty/prctl_option.sh
  static const char *prctl_options[] = {
   [1] = "SET_PDEATHSIG",
   [2] = "GET_PDEATHSIG",
   [3] = "GET_DUMPABLE",
   [4] = "SET_DUMPABLE",
   [5] = "GET_UNALIGN",
   [6] = "SET_UNALIGN",
   [7] = "GET_KEEPCAPS",
   [8] = "SET_KEEPCAPS",
   [9] = "GET_FPEMU",
   [10] = "SET_FPEMU",
   [11] = "GET_FPEXC",
   [12] = "SET_FPEXC",
   [13] = "GET_TIMING",
   [14] = "SET_TIMING",
   [15] = "SET_NAME",
   [16] = "GET_NAME",
   [19] = "GET_ENDIAN",
   [20] = "SET_ENDIAN",
   [21] = "GET_SECCOMP",
   [22] = "SET_SECCOMP",
   [23] = "CAPBSET_READ",
   [24] = "CAPBSET_DROP",
   [25] = "GET_TSC",
   [26] = "SET_TSC",
   [27] = "GET_SECUREBITS",
   [28] = "SET_SECUREBITS",
   [29] = "SET_TIMERSLACK",
   [30] = "GET_TIMERSLACK",
   [31] = "TASK_PERF_EVENTS_DISABLE",
   [32] = "TASK_PERF_EVENTS_ENABLE",
   [33] = "MCE_KILL",
   [34] = "MCE_KILL_GET",
   [35] = "SET_MM",
   [36] = "SET_CHILD_SUBREAPER",
   [37] = "GET_CHILD_SUBREAPER",
   [38] = "SET_NO_NEW_PRIVS",
   [39] = "GET_NO_NEW_PRIVS",
   [40] = "GET_TID_ADDRESS",
   [41] = "SET_THP_DISABLE",
   [42] = "GET_THP_DISABLE",
   [43] = "MPX_ENABLE_MANAGEMENT",
   [44] = "MPX_DISABLE_MANAGEMENT",
   [45] = "SET_FP_MODE",
   [46] = "GET_FP_MODE",
   [47] = "CAP_AMBIENT",
   [50] = "SVE_SET_VL",
   [51] = "SVE_GET_VL",
   [52] = "GET_SPECULATION_CTRL",
   [53] = "SET_SPECULATION_CTRL",
   [54] = "PAC_RESET_KEYS",
   [55] = "SET_TAGGED_ADDR_CTRL",
   [56] = "GET_TAGGED_ADDR_CTRL",
   [57] = "SET_IO_FLUSHER",
   [58] = "GET_IO_FLUSHER",
   [59] = "SET_SYSCALL_USER_DISPATCH",
   [60] = "PAC_SET_ENABLED_KEYS",
   [61] = "PAC_GET_ENABLED_KEYS",
   [62] = "SCHED_CORE",
  };
  static const char *prctl_set_mm_options[] = {
   [1] = "START_CODE",
   [2] = "END_CODE",
   [3] = "START_DATA",
   [4] = "END_DATA",
   [5] = "START_STACK",
   [6] = "START_BRK",
   [7] = "BRK",
   [8] = "ARG_START",
   [9] = "ARG_END",
   [10] = "ENV_START",
   [11] = "ENV_END",
   [12] = "AUXV",
   [13] = "EXE_FILE",
   [14] = "MAP",
   [15] = "MAP_SIZE",
  };
  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Colin Cross <ccross@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/lkml/YflZqY0rYQ3d1bKt@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoMerge tag 'asoc-fix-v5.17-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git...
Takashi Iwai [Tue, 1 Feb 2022 15:52:54 +0000 (16:52 +0100)]
Merge tag 'asoc-fix-v5.17-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.17

Quite a few fixes here, including an unusually large set in the core
spurred on by various testing efforts as well as the usual small driver
fixes.  There are quite a few fixes for out of bounds writes in both the
core and the various Qualcomm drivers, plus a couple of fixes for
locking in the DPCM code.

3 years agotools headers cpufeatures: Sync with the kernel sources
Arnaldo Carvalho de Melo [Thu, 1 Jul 2021 16:39:15 +0000 (13:39 -0300)]
tools headers cpufeatures: Sync with the kernel sources

To pick the changes from:

  690a757d610e50c2 ("kvm: x86: Add CPUID support for Intel AMX")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Jing Liu <jing2.liu@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/YflQCEO9FRLeTmlB@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync linux/perf_event.h with the kernel sources
Arnaldo Carvalho de Melo [Fri, 21 May 2021 19:00:31 +0000 (16:00 -0300)]
tools headers UAPI: Sync linux/perf_event.h with the kernel sources

To pick the trivial change in:

  cb1c4aba055f928f ("perf: Add new macros for mem_hops field")

Just comment source code alignment.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/YflPKLhu2AtHmPov@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools include UAPI: Sync sound/asound.h copy with the kernel sources
Arnaldo Carvalho de Melo [Wed, 12 Feb 2020 14:04:23 +0000 (11:04 -0300)]
tools include UAPI: Sync sound/asound.h copy with the kernel sources

Picking the changes from:

  55b71f6c29f2a78a ("ALSA: uapi: use C90 comment style instead of C99 style")
  fb6723daf89083a0 ("ALSA: pcm: comment about relation between msbits hw parameter and [S|U]32 formats")
  b456abe63f60ad93 ("ALSA: pcm: introduce INFO_NO_REWINDS flag")
  5aec579e08e4f2be ("ALSA: uapi: Fix a C++ style comment in asound.h")

Which entails no changes in the tooling side as it doesn't introduce new
SNDRV_PCM_IOCTL_ ioctls.

To silence this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
  diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h

Cc: Mark Brown <broonie@kernel.org>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/all/YflN0j09T+6ODHIh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoplatform/x86: thinkpad_acpi: Fix incorrect use of platform profile on AMD platforms
Mark Pearson [Thu, 27 Jan 2022 19:03:58 +0000 (14:03 -0500)]
platform/x86: thinkpad_acpi: Fix incorrect use of platform profile on AMD platforms

Lenovo AMD based platforms have been offering platform_profiles but they
are not working correctly. This is because the mode we are using on the
Intel platforms (MMC) is not available on the AMD platforms.

This commit adds checking of the functional capabilities returned by the
BIOS to confirm if MMC is supported or not. Profiles will not be
available if the platform is not MMC capable.

I'm investigating and working on an alternative for AMD platforms but
that is still work-in-progress.

Signed-off-by: Mark Pearson <markpearson@lenovo.com>
Link: https://lore.kernel.org/r/20220127190358.4078-1-markpearson@lenovo.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
3 years agoALSA: hda: Skip codec shutdown in case the codec is not registered
Peter Ujfalusi [Tue, 1 Feb 2022 11:21:44 +0000 (13:21 +0200)]
ALSA: hda: Skip codec shutdown in case the codec is not registered

If the codec->registered is not set then it means that pm_runtime is
not yet enabled and the codec->pcm_list_head has not been initialized.

The access to the not initialized pcm_list_head will lead a kernel crash
during shutdown.

Reported-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Tested-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Fixes: b98444ed597d ("ALSA: hda: Suspend codec at shutdown")
Link: https://lore.kernel.org/r/20220201112144.29411-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoRDMA/mlx4: Don't continue event handler after memory allocation failure
Leon Romanovsky [Mon, 31 Jan 2022 09:45:26 +0000 (11:45 +0200)]
RDMA/mlx4: Don't continue event handler after memory allocation failure

The failure to allocate memory during MLX4_DEV_EVENT_PORT_MGMT_CHANGE
event handler will cause skip the assignment logic, but
ib_dispatch_event() will be called anyway.

Fix it by calling to return instead of break after memory allocation
failure.

Fixes: 00f5ce99dc6e ("mlx4: Use port management change event instead of smp_snoop")
Link: https://lore.kernel.org/r/12a0e83f18cfad4b5f62654f141e240d04915e10.1643622264.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/siw: Fix broken RDMA Read Fence/Resume logic.
Bernard Metzler [Sun, 30 Jan 2022 17:08:15 +0000 (18:08 +0100)]
RDMA/siw: Fix broken RDMA Read Fence/Resume logic.

Code unconditionally resumed fenced SQ processing after next RDMA Read
completion, even if other RDMA Read responses are still outstanding, or
ORQ is full. Also adds comments for better readability of fence
processing, and removes orq_get_tail() helper, which is not needed
anymore.

Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Fixes: a531975279f3 ("rdma/siw: main include file")
Link: https://lore.kernel.org/r/20220130170815.1940-1-bmt@zurich.ibm.com
Reported-by: Jared Holzman <jared.holzman@excelero.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agokvm/x86: rework guest entry logic
Mark Rutland [Tue, 1 Feb 2022 13:29:24 +0000 (13:29 +0000)]
kvm/x86: rework guest entry logic

For consistency and clarity, migrate x86 over to the generic helpers for
guest timing and lockdep/RCU/tracing management, and remove the
x86-specific helpers.

Prior to this patch, the guest timing was entered in
kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and
svm_vcpu_enter_exit()), and was exited by the call to
vtime_account_guest_exit() within vcpu_enter_guest().

To minimize duplication and to more clearly balance entry and exit, both
entry and exit of guest timing are placed in vcpu_enter_guest(), using
the new guest_timing_{enter,exit}_irqoff() helpers. When context
tracking is used a small amount of additional time will be accounted
towards guests; tick-based accounting is unnaffected as IRQs are
disabled at this point and not enabled until after the return from the
guest.

This also corrects (benign) mis-balanced context tracking accounting
introduced in commits:

  ae95f566b3d22ade ("KVM: X86: TSCDEADLINE MSR emulation fastpath")
  26efe2fd92e50822 ("KVM: VMX: Handle preemption timer fastpath")

Where KVM can enter a guest multiple times, calling vtime_guest_enter()
without a corresponding call to vtime_account_guest_exit(), and with
vtime_account_system() called when vtime_account_guest() should be used.
As account_system_time() checks PF_VCPU and calls account_guest_time(),
this doesn't result in any functional problem, but is unnecessarily
confusing.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <20220201132926.3301912-4-mark.rutland@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agokvm/mips: rework guest entry logic
Mark Rutland [Tue, 1 Feb 2022 13:29:26 +0000 (13:29 +0000)]
kvm/mips: rework guest entry logic

In kvm_arch_vcpu_ioctl_run() we use guest_enter_irqoff() and
guest_exit_irqoff() directly, with interrupts masked between these. As
we don't handle any timer ticks during this window, we will not account
time spent within the guest as guest time, which is unfortunate.

Additionally, we do not inform lockdep or tracing that interrupts will
be enabled during guest execution, which caan lead to misleading traces
and warnings that interrupts have been enabled for overly-long periods.

This patch fixes these issues by using the new timing and context
entry/exit helpers to ensure that interrupts are handled during guest
vtime but with RCU watching, with a sequence:

guest_timing_enter_irqoff();

guest_state_enter_irqoff();
< run the vcpu >
guest_state_exit_irqoff();

< take any pending IRQs >

guest_timing_exit_irqoff();

In addition, as guest exits during the "run the vcpu" step are handled
by kvm_mips_handle_exit(), a wrapper function is added which ensures
that such exists are handled with a sequence:

guest_state_exit_irqoff();
< handle the exit >
guest_state_enter_irqoff();

This means that exits which stop the vCPU running will have a redundant
guest_state_enter_irqoff() .. guest_state_exit_irqoff() sequence, which
can be addressed with future rework.

Since instrumentation may make use of RCU, we must also ensure that no
instrumented code is run during the EQS. I've split out the critical
section into a new kvm_mips_enter_exit_vcpu() helper which is marked
noinstr.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Message-Id: <20220201132926.3301912-6-mark.rutland@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agokvm: add guest_state_{enter,exit}_irqoff()
Mark Rutland [Tue, 1 Feb 2022 13:29:22 +0000 (13:29 +0000)]
kvm: add guest_state_{enter,exit}_irqoff()

When transitioning to/from guest mode, it is necessary to inform
lockdep, tracing, and RCU in a specific order, similar to the
requirements for transitions to/from user mode. Additionally, it is
necessary to perform vtime accounting for a window around running the
guest, with RCU enabled, such that timer interrupts taken from the guest
can be accounted as guest time.

Most architectures don't handle all the necessary pieces, and a have a
number of common bugs, including unsafe usage of RCU during the window
between guest_enter() and guest_exit().

On x86, this was dealt with across commits:

  87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
  0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
  9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
  3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
  135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
  160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
  bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")

... but those fixes are specific to x86, and as the resulting logic
(while correct) is split across generic helper functions and
x86-specific helper functions, it is difficult to see that the
entry/exit accounting is balanced.

This patch adds generic helpers which architectures can use to handle
guest entry/exit consistently and correctly. The guest_{enter,exit}()
helpers are split into guest_timing_{enter,exit}() to perform vtime
accounting, and guest_context_{enter,exit}() to perform the necessary
context tracking and RCU management. The existing guest_{enter,exit}()
heleprs are left as wrappers of these.

Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff()
helpers are added to handle the ordering of lockdep, tracing, and RCU
manageent. These are inteneded to mirror exit_to_user_mode() and
enter_from_user_mode().

Subsequent patches will migrate architectures over to the new helpers,
following a sequence:

guest_timing_enter_irqoff();

guest_state_enter_irqoff();
< run the vcpu >
guest_state_exit_irqoff();

< take any pending IRQs >

guest_timing_exit_irqoff();

This sequences handles all of the above correctly, and more clearly
balances the entry and exit portions, making it easier to understand.

The existing helpers are marked as deprecated, and will be removed once
all architectures have been converted.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Message-Id: <20220201132926.3301912-2-mark.rutland@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agonet: phy: Fix qca8081 with speeds lower than 2.5Gb/s
Jonathan McDowell [Mon, 31 Jan 2022 13:56:41 +0000 (13:56 +0000)]
net: phy: Fix qca8081 with speeds lower than 2.5Gb/s

A typo in qca808x_read_status means we try to set SMII mode on the port
rather than SGMII when the link speed is not 2.5Gb/s. This results in no
traffic due to the mismatch in configuration between the phy and the
mac.

v2:
 Only change interface mode when the link is up

Fixes: 79c7bc0521545 ("net: phy: add qca8081 read_status")
Cc: stable@vger.kernel.org
Signed-off-by: Jonathan McDowell <noodles@earth.li>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agofanotify: Fix stale file descriptor in copy_event_to_user()
Dan Carpenter [Fri, 28 Jan 2022 19:57:01 +0000 (22:57 +0300)]
fanotify: Fix stale file descriptor in copy_event_to_user()

This code calls fd_install() which gives the userspace access to the fd.
Then if copy_info_records_to_user() fails it calls put_unused_fd(fd) but
that will not release it and leads to a stale entry in the file
descriptor table.

Generally you can't trust the fd after a call to fd_install().  The fix
is to delay the fd_install() until everything else has succeeded.

Fortunately it requires CAP_SYS_ADMIN to reach this code so the security
impact is less.

Fixes: f644bc449b37 ("fanotify: fix copy_event_to_user() fid error clean up")
Link: https://lore.kernel.org/r/20220128195656.GA26981@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jan Kara <jack@suse.cz>
3 years agoKVM: x86: Move delivery of non-APICv interrupt into vendor code
Sean Christopherson [Fri, 28 Jan 2022 00:51:48 +0000 (00:51 +0000)]
KVM: x86: Move delivery of non-APICv interrupt into vendor code

Handle non-APICv interrupt delivery in vendor code, even though it means
VMX and SVM will temporarily have duplicate code.  SVM's AVIC has a race
condition that requires KVM to fall back to legacy interrupt injection
_after_ the interrupt has been logged in the vIRR, i.e. to fix the race,
SVM will need to open code the full flow anyways[*].  Refactor the code
so that the SVM bug without introducing other issues, e.g. SVM would
return "success" and thus invoke trace_kvm_apicv_accept_irq() even when
delivery through the AVIC failed, and to opportunistically prepare for
using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will
rely on the vendor function matching the kvm_x86_op pointer name.

No functional change intended.

[*] https://lore.kernel.org/all/20211213104634.199141-4-mlevitsk@redhat.com

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
3 years agoUSB: serial: cp210x: add CPI Bulk Coin Recycler id
Johan Hovold [Tue, 1 Feb 2022 10:42:53 +0000 (11:42 +0100)]
USB: serial: cp210x: add CPI Bulk Coin Recycler id

Add the device id for the Crane Payment Innovation / Money Controls Bulk
Coin Recycler:

https://www.cranepi.com/en/system/files/Support/OM_BCR_EN_V1-04_0.pdf

Reported-by: Scott Russell <Scott.Russell2@ncr.com>
Cc: stable@vger.kernel.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
3 years agoUSB: serial: cp210x: add NCR Retail IO box id
Johan Hovold [Tue, 1 Feb 2022 10:42:52 +0000 (11:42 +0100)]
USB: serial: cp210x: add NCR Retail IO box id

Add the device id for NCR's Retail IO box (CP2105) used in NCR FastLane
SelfServ Checkout - R6C:

https://www.ncr.com/product-catalog/ncr-fastlane-selfserv-checkout-r6c

Reported-by: Scott Russell <Scott.Russell2@ncr.com>
Cc: stable@vger.kernel.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
3 years agoUSB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
Cameron Williams [Tue, 1 Feb 2022 10:12:51 +0000 (10:12 +0000)]
USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320

This patch adds support for the Brainboxes US-159, US-235 and US-320
USB-to-Serial devices.

Signed-off-by: Cameron Williams <cang1@live.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
3 years agodrm/i915/pmu: Fix KMD and GuC race on accessing busyness
Umesh Nerlige Ramappa [Tue, 25 Jan 2022 02:01:24 +0000 (18:01 -0800)]
drm/i915/pmu: Fix KMD and GuC race on accessing busyness

GuC updates shared memory and KMD reads it. Since this is not
synchronized, we run into a race where the value read is inconsistent.
Sometimes the inconsistency is in reading the upper MSB bytes of the
last_switch_in value. 2 types of cases are seen - upper 8 bits are zero
and upper 24 bits are zero. Since these are non-zero values, it is
not trivial to determine validity of these values. Instead we read the
values multiple times until they are consistent. In test runs, 3
attempts results in consistent values. The upper bound is set to 6
attempts and may need to be tuned as per any new occurences.

Since the duration that gt is parked can vary, the patch also updates
the gt timestamp on unpark before starting the worker.

v2:
- Initialize i
- Use READ_ONCE to access engine record

Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220125020124.788679-2-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 512712a824de9b856a4e61343e3e4390eba2c391)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
3 years agodma-buf: heaps: Fix potential spectre v1 gadget
Jordy Zomer [Sat, 29 Jan 2022 15:06:04 +0000 (16:06 +0100)]
dma-buf: heaps: Fix potential spectre v1 gadget

It appears like nr could be a Spectre v1 gadget as it's supplied by a
user and used as an array index. Prevent the contents
of kernel memory from being leaked to userspace via speculative
execution by using array_index_nospec.

Signed-off-by: Jordy Zomer <jordy@pwning.systems>
Fixes: c02a81fba74f ("dma-buf: Add dma-buf heaps framework")
Cc: <stable@vger.kernel.org> # v5.6+
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
 [sumits: added fixes and cc: stable tags]
Link: https://patchwork.freedesktop.org/patch/msgid/20220129150604.3461652-1-jordy@pwning.systems
3 years agoMIPS: KVM: fix vz.c kernel-doc notation
Randy Dunlap [Sat, 29 Jan 2022 20:58:19 +0000 (12:58 -0800)]
MIPS: KVM: fix vz.c kernel-doc notation

Fix all kernel-doc warnings in mips/kvm/vz.c as reported by the
kernel test robot:

  arch/mips/kvm/vz.c:471: warning: Function parameter or member 'out_compare' not described in '_kvm_vz_save_htimer'
  arch/mips/kvm/vz.c:471: warning: Function parameter or member 'out_cause' not described in '_kvm_vz_save_htimer'
  arch/mips/kvm/vz.c:471: warning: Excess function parameter 'compare' description in '_kvm_vz_save_htimer'
  arch/mips/kvm/vz.c:471: warning: Excess function parameter 'cause' description in '_kvm_vz_save_htimer'
  arch/mips/kvm/vz.c:1551: warning: No description found for return value of 'kvm_trap_vz_handle_cop_unusable'
  arch/mips/kvm/vz.c:1552: warning: expecting prototype for kvm_trap_vz_handle_cop_unusuable(). Prototype was for kvm_trap_vz_handle_cop_unusable() instead
  arch/mips/kvm/vz.c:1597: warning: No description found for return value of 'kvm_trap_vz_handle_msa_disabled'

Fixes: c992a4f6a9b0 ("KVM: MIPS: Implement VZ support")
Fixes: f4474d50c7d4 ("KVM: MIPS/VZ: Support hardware guest timer")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: kvm@vger.kernel.org
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 years agoMIPS: octeon: Fix missed PTR->PTR_WD conversion
Thomas Bogendoerfer [Mon, 31 Jan 2022 10:07:02 +0000 (11:07 +0100)]
MIPS: octeon: Fix missed PTR->PTR_WD conversion

Fixes: fa62f39dc7e2 ("MIPS: Fix build error due to PTR used in more places")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
3 years agoALSA: usb-audio: Correct quirk for VF0770
Jonas Hahnfeld [Mon, 31 Jan 2022 18:35:16 +0000 (19:35 +0100)]
ALSA: usb-audio: Correct quirk for VF0770

This device provides both audio and video. The original quirk added in
commit 48827e1d6af5 ("ALSA: usb-audio: Add quirk for VF0770") used
USB_DEVICE to match the vendor and product ID. Depending on module order,
if snd-usb-audio was asked first, it would match the entire device and
uvcvideo wouldn't get to see it. Change the matching to USB_AUDIO_DEVICE
to restore uvcvideo matching in all cases.

Fixes: 48827e1d6af5 ("ALSA: usb-audio: Add quirk for VF0770")
Reported-by: Jukka Heikintalo <heikintalo.jukka@gmail.com>
Tested-by: Jukka Heikintalo <heikintalo.jukka@gmail.com>
Reported-by: Paweł Susicki <pawel.susicki@gmail.com>
Tested-by: Paweł Susicki <pawel.susicki@gmail.com>
Cc: <stable@vger.kernel.org> # 5.4, 5.10, 5.14, 5.15
Signed-off-by: Jonas Hahnfeld <hahnjo@hahnjo.de>
Link: https://lore.kernel.org/r/20220131183516.61191-1-hahnjo@hahnjo.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoALSA: Replace acpi_bus_get_device()
Rafael J. Wysocki [Wed, 26 Jan 2022 19:48:49 +0000 (20:48 +0100)]
ALSA: Replace acpi_bus_get_device()

Replace acpi_bus_get_device() that is going to be dropped with
acpi_fetch_acpi_dev().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/2828205.e9J7NaK4W3@kreacher
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agonet: macsec: Fix offload support for NETDEV_UNREGISTER event
Lior Nahmanson [Sun, 30 Jan 2022 11:29:01 +0000 (13:29 +0200)]
net: macsec: Fix offload support for NETDEV_UNREGISTER event

Current macsec netdev notify handler handles NETDEV_UNREGISTER event by
releasing relevant SW resources only, this causes resources leak in case
of macsec HW offload, as the underlay driver was not notified to clean
it's macsec offload resources.

Fix by calling the underlay driver to clean it's relevant resources
by moving offload handling from macsec_dellink() to macsec_common_dellink()
when handling NETDEV_UNREGISTER event.

Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure")
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/1643542141-28956-1-git-send-email-raeds@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: stmmac: dump gmac4 DMA registers correctly
Camel Guo [Mon, 31 Jan 2022 08:38:40 +0000 (09:38 +0100)]
net: stmmac: dump gmac4 DMA registers correctly

Unlike gmac100, gmac1000, gmac4 has 27 DMA registers and they are
located at DMA_CHAN_BASE_ADDR (0x1100). In order for ethtool to dump
gmac4 DMA registers correctly, this commit checks if a net_device has
gmac4 and uses different logic to dump its DMA registers.

This fixes the following KASAN warning, which can normally be triggered
by a command similar like "ethtool -d eth0":

BUG: KASAN: vmalloc-out-of-bounds in dwmac4_dump_dma_regs+0x6d4/0xb30
Write of size 4 at addr ffffffc010177100 by task ethtool/1839
 kasan_report+0x200/0x21c
 __asan_report_store4_noabort+0x34/0x60
 dwmac4_dump_dma_regs+0x6d4/0xb30
 stmmac_ethtool_gregs+0x110/0x204
 ethtool_get_regs+0x200/0x4b0
 dev_ethtool+0x1dac/0x3800
 dev_ioctl+0x7c0/0xb50
 sock_ioctl+0x298/0x6c4
 ...

Fixes: fbf68229ffe7 ("net: stmmac: unify registers dumps methods")
Signed-off-by: Camel Guo <camelg@axis.com>
Link: https://lore.kernel.org/r/20220131083841.3346801-1-camel.guo@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: mt7530: make NET_DSA_MT7530 select MEDIATEK_GE_PHY
Arınç ÜNAL [Sat, 29 Jan 2022 06:27:04 +0000 (09:27 +0300)]
net: dsa: mt7530: make NET_DSA_MT7530 select MEDIATEK_GE_PHY

Make MediaTek MT753x DSA driver enable MediaTek Gigabit PHYs driver to
properly control MT7530 and MT7531 switch PHYs.

A noticeable change is that the behaviour of switchport interfaces going
up-down-up-down is no longer there.

Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220129062703.595-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agodrm/amd: Warn users about potential s0ix problems
Mario Limonciello [Tue, 11 Jan 2022 20:00:26 +0000 (14:00 -0600)]
drm/amd: Warn users about potential s0ix problems

On some OEM setups users can configure the BIOS for S3 or S2idle.
When configured to S3 users can still choose 's2idle' in the kernel by
using `/sys/power/mem_sleep`.  Before commit 6dc8265f9803 ("drm/amdgpu:
always reset the asic in suspend (v2)"), the GPU would crash.  Now when
configured this way, the system should resume but will use more power.

As such, adjust the `amdpu_acpi_is_s0ix function` to warn users about
potential power consumption issues during their first attempt at
suspending.

Reported-by: Bjoren Dasse <bjoern.daase@gmail.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1824
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: correct the MGpuFanBoost support for Beige Goby
Evan Quan [Mon, 24 Jan 2022 05:40:35 +0000 (13:40 +0800)]
drm/amd/pm: correct the MGpuFanBoost support for Beige Goby

The existing way cannot handle Beige Goby well as a different
PPTable data structure(PPTable_beige_goby_t instead of PPTable_t)
is used there.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
3 years agoi40e: Fix reset path while removing the driver
Karen Sornek [Wed, 12 Jan 2022 09:19:47 +0000 (10:19 +0100)]
i40e: Fix reset path while removing the driver

Fix the crash in kernel while dereferencing the NULL pointer,
when the driver is unloaded and simultaneously the VSI rings
are being stopped.

The hardware requires 50msec in order to finish RX queues
disable. For this purpose the driver spins in mdelay function
for the operation to be completed.

For example changing number of queues which requires reset would
fail in the following call stack:

1) i40e_prep_for_reset
2) i40e_pf_quiesce_all_vsi
3) i40e_quiesce_vsi
4) i40e_vsi_close
5) i40e_down
6) i40e_vsi_stop_rings
7) i40e_vsi_control_rx -> disable requires the delay of 50msecs
8) continue back in i40e_down function where
   i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash

When the driver was spinning vsi_release called
i40e_vsi_free_arrays where the vsi->tx_rings resources
were freed and the pointer was set to NULL.

Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoscsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal
John Garry [Mon, 31 Jan 2022 11:13:27 +0000 (19:13 +0800)]
scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal

The hisi_sas_slot.is_internal member is not set properly for ATA commands
which the driver sends directly. A TMF struct pointer is normally used as a
test to set this, but it is NULL for those commands. It's not ideal, but
pass an empty TMF struct to set that member properly.

Link: https://lore.kernel.org/r/1643627607-138785-1-git-send-email-john.garry@huawei.com
Fixes: dc313f6b125b ("scsi: hisi_sas: Factor out task prep and delivery code")
Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agodrm/nouveau: fix off by one in BIOS boundary checking
Nick Lopez [Sat, 22 Jan 2022 08:19:06 +0000 (01:19 -0700)]
drm/nouveau: fix off by one in BIOS boundary checking

Bounds checking when parsing init scripts embedded in the BIOS reject
access to the last byte. This causes driver initialization to fail on
Apple eMac's with GeForce 2 MX GPUs, leaving the system with no working
console.

This is probably only seen on OpenFirmware machines like PowerPC Macs
because the BIOS image provided by OF is only the used parts of the ROM,
not a power-of-two blocks read from PCI directly so PCs always have
empty bytes at the end that are never accessed.

Signed-off-by: Nick Lopez <github@glowingmonkey.org>
Fixes: 4d4e9907ff572 ("drm/nouveau/bios: guard against out-of-bounds accesses to image")
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220122081906.2633061-1-github@glowingmonkey.org
3 years agoscsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
John Garry [Thu, 27 Jan 2022 13:12:52 +0000 (21:12 +0800)]
scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task

Currently a use-after-free may occur if a sas_task is aborted by the upper
layer before we handle the I/O completion in mpi_ssp_completion() or
mpi_sata_completion().

In this case, the following are the two steps in handling those I/O
completions:

 - Call complete() to inform the upper layer handler of completion of
   the I/O.

 - Release driver resources associated with the sas_task in
   pm8001_ccb_task_free() call.

When complete() is called, the upper layer may free the sas_task. As such,
we should not touch the associated sas_task afterwards, but we do so in the
pm8001_ccb_task_free() call.

Fix by swapping the complete() and pm8001_ccb_task_free() calls ordering.

Link: https://lore.kernel.org/r/1643289172-165636-4-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: pm8001: Fix use-after-free for aborted TMF sas_task
John Garry [Thu, 27 Jan 2022 13:12:51 +0000 (21:12 +0800)]
scsi: pm8001: Fix use-after-free for aborted TMF sas_task

Currently a use-after-free may occur if a TMF sas_task is aborted before we
handle the IO completion in mpi_ssp_completion(). The abort occurs due to
timeout.

When the timeout occurs, the SAS_TASK_STATE_ABORTED flag is set and the
sas_task is freed in pm8001_exec_internal_tmf_task().

However, if the I/O completion occurs later, the I/O completion still
thinks that the sas_task is available. Fix this by clearing the ccb->task
if the TMF times out - the I/O completion handler does nothing if this
pointer is cleared.

Link: https://lore.kernel.org/r/1643289172-165636-3-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: pm8001: Fix warning for undescribed param in process_one_iomb()
John Garry [Thu, 27 Jan 2022 13:12:50 +0000 (21:12 +0800)]
scsi: pm8001: Fix warning for undescribed param in process_one_iomb()

make W=1 complains of an undescribed function parameter:

drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or member 'circularQ' not described in 'process_one_iomb'

Fix it.

Link: https://lore.kernel.org/r/1643289172-165636-2-git-send-email-john.garry@huawei.com
Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoi40e: Fix reset bw limit when DCB enabled with 1 TC
Jedrzej Jagielski [Tue, 14 Dec 2021 10:08:22 +0000 (10:08 +0000)]
i40e: Fix reset bw limit when DCB enabled with 1 TC

There was an AQ error I40E_AQ_RC_EINVAL when trying
to reset bw limit as part of bw allocation setup.
This was caused by trying to reset bw limit with
DCB enabled. Bw limit should not be reset when
DCB is enabled. The code was relying on the pf->flags
to check if DCB is enabled but if only 1 TC is available
this flag will not be set even though DCB is enabled.
Add a check for number of TC and if it is 1
don't try to reset bw limit even if pf->flags shows
DCB as disabled.

Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled")
Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com> # Flatten the condition
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoxfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*
Darrick J. Wong [Mon, 24 Jan 2022 23:48:31 +0000 (15:48 -0800)]
xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*

Syzbot tripped over the following complaint from the kernel:

WARNING: CPU: 2 PID: 15402 at mm/util.c:597 kvmalloc_node+0x11e/0x125 mm/util.c:597

While trying to run XFS_IOC_GETBMAP against the following structure:

struct getbmap fubar = {
.bmv_count = 0x22dae649,
};

Obviously, this is a crazy huge value since the next thing that the
ioctl would do is allocate 37GB of memory.  This is enough to make
kvmalloc mad, but isn't large enough to trip the validation functions.
In other words, I'm fussing with checks that were **already sufficient**
because that's easier than dealing with 644 internal bug reports.  Yes,
that's right, six hundred and forty-four.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
3 years agoACPI: require CRC32 to build
Randy Dunlap [Sun, 30 Jan 2022 22:38:18 +0000 (14:38 -0800)]
ACPI: require CRC32 to build

ACPI core now requires crc32() but the kernel build can fail when
CRC32 is not set/enabled, so select it in the ACPI Kconfig entry.

Fixes this build error:

ia64-linux-ld: drivers/acpi/scan.o: in function `acpi_store_pld_crc':
include/acpi/platform/aclinuxex.h:62: undefined reference to `crc32_le'

Fixes: 882c982dada4 ("acpi: Store CRC-32 hash of the _PLD in struct acpi_device")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoscsi: core: Reallocate device's budget map on queue depth change
Ming Lei [Thu, 27 Jan 2022 15:37:33 +0000 (23:37 +0800)]
scsi: core: Reallocate device's budget map on queue depth change

We currently use ->cmd_per_lun as initial queue depth for setting up the
budget_map. Martin Wilck reported that it is common for the queue_depth to
be subsequently updated in slave_configure() based on detected hardware
characteristics.

As a result, for some drivers, the static host template settings for
cmd_per_lun and can_queue won't actually get used in practice. And if the
default values are used to allocate the budget_map, memory may be consumed
unnecessarily.

Fix the issue by reallocating the budget_map after ->slave_configure()
returns. At that time the device queue_depth should accurately reflect what
the hardware needs.

Link: https://lore.kernel.org/r/20220127153733.409132-1-ming.lei@redhat.com
Cc: Bart Van Assche <bvanassche@acm.org>
Reported-by: Martin Wilck <martin.wilck@suse.com>
Suggested-by: Martin Wilck <martin.wilck@suse.com>
Tested-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agokselftest: Fix vdso_test_abi return status
Vincenzo Frascino [Mon, 31 Jan 2022 11:34:05 +0000 (11:34 +0000)]
kselftest: Fix vdso_test_abi return status

vdso_test_abi contains a batch of tests that verify the validity of the
vDSO ABI.

When a vDSO symbol is not found the relevant test is skipped reporting
KSFT_SKIP. All the tests return values are then added in a single
variable which is checked to verify failures. This approach can have
side effects which result in reporting the wrong kselftest exit status.

Fix vdso_test_abi verifying the return code of each test separately.

Cc: Shuah Khan <shuah@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agoscsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
John Meneghini [Mon, 24 Jan 2022 14:51:10 +0000 (09:51 -0500)]
scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe

Running tests with a debug kernel shows that bnx2fc_recv_frame() is
modifying the per_cpu lport stats counters in a non-mpsafe way.  Just boot
a debug kernel and run the bnx2fc driver with the hardware enabled.

[ 1391.699147] BUG: using smp_processor_id() in preemptible [00000000] code: bnx2fc_
[ 1391.699160] caller is bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc]
[ 1391.699174] CPU: 2 PID: 4355 Comm: bnx2fc_l2_threa Kdump: loaded Tainted: G    B
[ 1391.699180] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013
[ 1391.699183] Call Trace:
[ 1391.699188]  dump_stack_lvl+0x57/0x7d
[ 1391.699198]  check_preemption_disabled+0xc8/0xd0
[ 1391.699205]  bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc]
[ 1391.699215]  ? do_raw_spin_trylock+0xb5/0x180
[ 1391.699221]  ? bnx2fc_npiv_create_vports.isra.0+0x4e0/0x4e0 [bnx2fc]
[ 1391.699229]  ? bnx2fc_l2_rcv_thread+0xb7/0x3a0 [bnx2fc]
[ 1391.699240]  bnx2fc_l2_rcv_thread+0x1af/0x3a0 [bnx2fc]
[ 1391.699250]  ? bnx2fc_ulp_init+0xc0/0xc0 [bnx2fc]
[ 1391.699258]  kthread+0x364/0x420
[ 1391.699263]  ? _raw_spin_unlock_irq+0x24/0x50
[ 1391.699268]  ? set_kthread_struct+0x100/0x100
[ 1391.699273]  ret_from_fork+0x22/0x30

Restore the old get_cpu/put_cpu code with some modifications to reduce the
size of the critical section.

Link: https://lore.kernel.org/r/20220124145110.442335-1-jmeneghi@redhat.com
Fixes: d576a5e80cd0 ("bnx2fc: Improve stats update mechanism")
Tested-by: Guangwu Zhang <guazhang@redhat.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoscsi: pm80xx: Fix double completion for SATA devices
Ajish Koshy [Mon, 24 Jan 2022 08:22:55 +0000 (13:52 +0530)]
scsi: pm80xx: Fix double completion for SATA devices

Current code handles completions for SATA devices in mpi_sata_completion()
and mpi_sata_event().

However, at the time when any SATA event happens, for almost all the event
types, the command is still in the target. It is therefore incorrect to
complete the task in sata_event().

There are some events for which we get sata_completions, some need recovery
procedure and others abort. All the tasks must be completed via
sata_completion() path.

Removed the task done related code from sata_events().  For tasks where we
don't get completions, let top layer call abort() to abort the command post
timeout.

Link: https://lore.kernel.org/r/20220124082255.86223-1-Ajish.Koshy@microchip.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Co-developed-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
3 years agoiommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
Guoqing Jiang [Fri, 28 Jan 2022 03:10:02 +0000 (11:10 +0800)]
iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()

After commit e3beca48a45b ("irqdomain/treewide: Keep firmware node
unconditionally allocated"). For tear down scenario, fn is only freed
after fail to allocate ir_domain, though it also should be freed in case
dmar_enable_qi returns error.

Besides free fn, irq_domain and ir_msi_domain need to be removed as well
if intel_setup_irq_remapping fails to enable queued invalidation.

Improve the rewinding path by add out_free_ir_domain and out_free_fwnode
lables per Baolu's suggestion.

Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20220119063640.16864-1-guoqing.jiang@linux.dev
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220128031002.2219155-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>