www.infradead.org Git - users/hch/block.git/log

block: move rq_qos_exit() into disk_release()

There can't be FS IO in disk_release(), so it is safe to move rq_qos_exit()
there.

Signed-off-by: Ming Lei <ming.lei@redhat.com>

block: do more work in elevator_exit

Move the calls to ioc_clear_queue and blk_mq_sched_free_rqs into
elevator_exit. Except for one call where we know we can't have io_cq
structures yet these always go together, and that extra call in an
error path is harmless.

Signed-off-by: Christoph Hellwig <hch@lst.de>

block: move blk_exit_queue into disk_release

There can't be FS IO in disk_release(), so move blk_exit_queue() there.

We still need to freeze queue here since the request is freed after the
bio is completed and passthrough request rely on scheduler tags as well.

The disk can be released before or after queue is cleaned up, and we have
to free the scheduler request pool before blk_cleanup_queue returns,
while the static request pool has to be freed before exiting the
I/O scheduler.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>

block: move q_usage_counter release into blk_queue_release

After blk_cleanup_queue() returns, disk may not be released yet, so
probably bio may still be submitted and ->q_usage_counter may be
touched, so far this way seems safe, but not good from API's viewpoint.

Move the release q_usage_counter into blk_queue_release().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

block: don't remove hctx debugfs dir from blk_mq_exit_queue

The queue's top debugfs dir is removed from blk_release_queue(), so all
hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
is only called from blk_cleanup_queue(), it isn't necessary to remove
hctx debugfs from blk_mq_exit_queue().

So remove it from blk_mq_exit_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

block: move blkcg initialization/destroy into disk allocation/release handler

blkcg works on FS bio level, so it is reasonable to make both blkcg and
gendisk sharing same lifetime. Meantime there won't be any FS IO when
releasing disk, so safe to move blkcg initialization/destroy into disk
allocation/release handler

Long term, we can move blkcg into gendisk completely.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sr: implement ->free_disk

Simplify the refcounting and remove the need to clear disk->private_data
by implementing the ->free_disk method.

Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: make use of ->free_disk to simplify refcounting

Implement the ->free_disk method to to put struct scsi_disk when the last
gendisk reference count goes away. This removes the need to clear
->private_data and thus freeze the queue on unbind.

Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: delay calling free_opal_dev

Call free_opal_dev from scsi_disk_release as the opal_dev field is access
from the ioctl handler, which isn't synchronized vs sd_release and thus
can be accesses during or after sd_release was called.

Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: call sd_zbc_release_disk before releasing the scsi_device reference

sd_zbc_release_disk accesses disk->device, so ensure that actually still has
a valid reference.

Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: rename the scsi_disk.dev field

dev is very hard to grab for. Give the field a more descriptive name and
documents it's purpose.

Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: don't use disk->private_data to find the scsi_driver

Requiring every ULP to have the scsi_drive as first member of the
private data is rather fragile and not necessary anyway. Just use
the driver hanging off the SCSI device instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>

blk-mq: handle already freed tags gracefully in blk_mq_free_rqs

To simplify further changes allow for double calling blk_mq_free_rqs on
a queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: split out from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>

blk-mq: do not include passthrough requests in I/O accounting

I/O accounting buckets I/O into the read/write/discard categories into
which passthrough I/O does not fit at all. It also accounts to the
block_device, which may not even exist for passthrough I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Merge branch 'for-5.18/block' into for-next

* for-5.18/block:
  blk-crypto: show crypto capabilities in sysfs
  block: don't delete queue kobject before its children
  block: simplify calling convention of elv_unregister_queue()

blk-crypto: show crypto capabilities in sysfs

Add sysfs files that expose the inline encryption capabilities of
request queues:

/sys/block/$disk/queue/crypto/max_dun_bits
/sys/block/$disk/queue/crypto/modes/$mode
/sys/block/$disk/queue/crypto/num_keyslots

Userspace can use these new files to decide what encryption settings to
use, or whether to use inline encryption at all.  This also brings the
crypto capabilities in line with the other queue properties, which are
already discoverable via the queue directory in sysfs.

Design notes:

  - Place the new files in a new subdirectory "crypto" to group them
    together and to avoid complicating the main "queue" directory.  This
    also makes it possible to replace "crypto" with a symlink later if
    we ever make the blk_crypto_profiles into real kobjects (see below).

  - It was necessary to define a new kobject that corresponds to the
    crypto subdirectory.  For now, this kobject just contains a pointer
    to the blk_crypto_profile.  Note that multiple queues (and hence
    multiple such kobjects) may refer to the same blk_crypto_profile.

    An alternative design would more closely match the current kernel
    data structures: the blk_crypto_profile could be a kobject itself,
    located directly under the host controller device's kobject, while
    /sys/block/$disk/queue/crypto would be a symlink to it.

    I decided not to do that for now because it would require a lot more
    changes, such as no longer embedding blk_crypto_profile in other
    structures, and also because I'm not sure we can rule out moving the
    crypto capabilities into 'struct queue_limits' in the future.  (Even
    if multiple queues share the same crypto engine, maybe the supported
    data unit sizes could differ due to other queue properties.)  It
    would also still be possible to switch to that design later without
    breaking userspace, by replacing the directory with a symlink.

  - Use "max_dun_bits" instead of "max_dun_bytes".  Currently, the
    kernel internally stores this value in bytes, but that's an
    implementation detail.  It probably makes more sense to talk about
    this value in bits, and choosing bits is more future-proof.

  - "modes" is a sub-subdirectory, since there may be multiple supported
    crypto modes, sysfs is supposed to have one value per file, and it
    makes sense to group all the mode files together.

  - Each mode had to be named.  The crypto API names like "xts(aes)" are
    not appropriate because they don't specify the key size.  Therefore,
    I assigned new names.  The exact names chosen are arbitrary, but
    they happen to match the names used in log messages in fs/crypto/.

  - The "num_keyslots" file is a bit different from the others in that
    it is only useful to know for performance reasons.  However, it's
    included as it can still be useful.  For example, a user might not
    want to use inline encryption if there aren't very many keyslots.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220124215938.2769-4-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: don't delete queue kobject before its children

kobjects aren't supposed to be deleted before their child kobjects are
deleted.  Apparently this is usually benign; however, a WARN will be
triggered if one of the child kobjects has a named attribute group:

    sysfs group 'modes' not found for kobject 'crypto'
    WARNING: CPU: 0 PID: 1 at fs/sysfs/group.c:278 sysfs_remove_group+0x72/0x80
    ...
    Call Trace:
      sysfs_remove_groups+0x29/0x40 fs/sysfs/group.c:312
      __kobject_del+0x20/0x80 lib/kobject.c:611
      kobject_cleanup+0xa4/0x140 lib/kobject.c:696
      kobject_release lib/kobject.c:736 [inline]
      kref_put include/linux/kref.h:65 [inline]
      kobject_put+0x53/0x70 lib/kobject.c:753
      blk_crypto_sysfs_unregister+0x10/0x20 block/blk-crypto-sysfs.c:159
      blk_unregister_queue+0xb0/0x110 block/blk-sysfs.c:962
      del_gendisk+0x117/0x250 block/genhd.c:610

Fix this by moving the kobject_del() and the corresponding
kobject_uevent() to the correct place.

Fixes: 2c2086afc2b8 ("block: Protect less code with sysfs_lock in blk_{un,}register_queue()")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124215938.2769-3-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: simplify calling convention of elv_unregister_queue()

Make elv_unregister_queue() a no-op if q->elevator is NULL or is not
registered.

This simplifies the existing callers, as well as the future caller in
the error path of blk_register_queue().

Also don't bother checking whether q is NULL, since it never is.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124215938.2769-2-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-5.18/block' into for-next

* for-5.18/block:
block: remove redundant semicolon

block: remove redundant semicolon

Remove redundant semicolon from block/bdev.c

Signed-off-by: Nian Yanchuan <yanchuan@nfschina.com>
Link: https://lore.kernel.org/r/20220227170124.GA14658@localhost.localdomain
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-5.18/io_uring-statx' into for-next

* for-5.18/io_uring-statx:
io-uring: Make statx API stable

Merge branch 'for-5.18/alloc-cleanups' into for-next

* for-5.18/alloc-cleanups:
  nilfs2: pass the operation to bio_alloc
  ext4: pass the operation to bio_alloc
  mpage: pass the operation to bio_alloc

Merge branch 'for-5.18/drivers' into for-next

* for-5.18/drivers:
  null_blk: null_alloc_page() cleanup
  null_blk: remove hardcoded null_alloc_page() param
  null_blk: remove hardcoded alloc_cmd() parameter
  loop: allow user to set the queue depth
  loop: remove extra variable in lo_req_flush
  loop: remove extra variable in lo_fallocate()
  loop: use sysfs_emit() in the sysfs xxx show()
  null_blk: fix return value from null_add_dev()
  loop: clean up grammar in warning message
  block/rnbd: Remove a useless mutex
  block/rnbd: client device does not care queue/rotational
  block/rnbd-clt: fix CHECK:BRACES warning

Merge branch 'for-5.18/block' into for-next

* for-5.18/block: (81 commits)
  block: default BLOCK_LEGACY_AUTOLOAD to y
  block: update io_ticks when io hang
  block, bfq: don't move oom_bfqq
  block, bfq: avoid moving bfqq to it's parent bfqg
  block, bfq: cleanup bfq_bfqq_to_bfqg()
  block/bfq_wf2q: correct weight to ioprio
  blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queues
  virtio_blk: simplify refcounting
  memstick/mspro_block: simplify refcounting
  memstick/mspro_block: fix handling of read-only devices
  memstick/ms_block: simplify refcounting
  block: add a ->free_disk method
  block: revert 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO scenarios")
  block: don't try to throttle split bio if iops limit isn't set
  block: throttle split bio in case of iops limit
  block: merge submit_bio_checks() into submit_bio_noacct
  block: don't check bio in blk_throtl_dispatch_work_fn
  block: don't declare submit_bio_checks in local header
  block: move blk_crypto_bio_prep() out of blk-mq.c
  block: move submit_bio_checks() into submit_bio_noacct
  ...

nilfs2: pass the operation to bio_alloc

Refactor the segbuf write code to pass the op to bio_alloc instead of
setting it just before the submission.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Link: https://lore.kernel.org/r/20220222154634.597067-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ext4: pass the operation to bio_alloc

Refactor the readpage code to pass the op to bio_alloc instead of setting
it just before the submission.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20220222154634.597067-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mpage: pass the operation to bio_alloc

Refactor the mpage read/write page code to pass the op to bio_alloc
instead of setting it just before the submission.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220222154634.597067-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: null_alloc_page() cleanup

Remove goto labels and use direct returns as error unwinding code only
needs to free t_page variable if we alloc_pages() call fails as having
two labels for one kfree() can be avoided easily.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220222152852.26043-3-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: remove hardcoded null_alloc_page() param

Only caller of null_alloc_page() is null_insert_page() unconditionally
sets only parameter to GFP_NOIO and that is statically hard-coded in
null_blk. There is no point in having statically hardcoded function
parameter.

Remove the unnecessary parameter gfp_flags and adjust the code, so it
can retain existing behavior null_alloc_page() with GFP_NOIO.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220222152852.26043-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: remove hardcoded alloc_cmd() parameter

Only caller of alloc_cmd() is null_submit_bio() unconditionally sets
second parameter to true and that is statically hard-coded in null_blk.
There is no point in having statically hardcoded function parameter.

Remove the unnecessary parameter can_wait and adjust the code so it
can retain existing behavior of waiting when we don't get valid
nullb_cmd from __alloc_cmd() in alloc_cmd().

The restructured code avoids multiple return statements, multiple
calls to __alloc_cmd() and resulting a fast path call to
prepare_to_wait() due to removal of first alloc_cmd() call.

Follow the pattern that we have in bio_alloc() to set the structure
members in the structure allocation function in alloc_cmd() and pass
bio to initialize newly allocated cmd->bio member.

Follow the pattern in copy_to_nullb() to use result of one function call
(null_cache_active()) to be used as a parameter to another function call
(null_insert_page()), use result of alloc_cmd() as a first parameter to
the null_handle_cmd() in null_submit_bio() function. This allow us to
remove the local variable cmd on stack in null_submit_bio() that is in
fast path.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220216172945.31124-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

loop: allow user to set the queue depth

Instead of hardcoding queue depth allow user to set the hw queue depth
using module parameter. Set default value to 128 to retain the existing
behavior.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-5-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

loop: remove extra variable in lo_req_flush

The local variable file is used to pass it to the vfs_fsync(). We can
get away with using lo->lo_backing_file instead of storing in a local
variable which is not used anywhere else.

No functional change in this patch.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-4-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

loop: remove extra variable in lo_fallocate()

The local variable q is used to pass it to the blk_queue_discard(). We
can get away with using lo->lo_queue instead of storing in a local
variable which is not used anywhere else.

No functional change in this patch.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-3-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

loop: use sysfs_emit() in the sysfs xxx show()

sprintf does not know the PAGE_SIZE maximum of the temporary buffer
used for outputting sysfs content and it's possible to overrun the
PAGE_SIZE buffer length.

Use a generic sysfs_emit function that knows the size of the
temporary buffer and ensures that no overrun is done for offset
attribute in
loop_attr_[offset|sizelimit|autoclear|partscan|dio]_show() callbacks.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: fix return value from null_add_dev()

The function nullb_device_power_store() returns -ENOMEM when
null_add_dev() fails. null_add_dev() can fail with return value
other than -ENOMEM such as -EINVAL when Zoned Block Device option
is used, see :

nullb_device_power_store()
null_add_dev()
null_init_zoned_dev()
return -EINVAL;

When trying to load the module having -ENOMEM value returned on the
command line creates confusion when pleanty of memory is free on the
machine.

Instead of hardcoding -ENOMEM return the value of null_add_dev()
function.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220215115951.15945-1-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

loop: clean up grammar in warning message

The phrase "has still" should be "still has" to clean up the grammar.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220208114656.61629-1-colin.i.king@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block/rnbd: Remove a useless mutex

According to lib/idr.c,
The IDA handles its own locking. It is safe to call any of the IDA
functions without synchronisation in your code.

so the 'ida_lock' mutex can just be removed.
It is here only to protect some ida_simple_get()/ida_simple_remove() calls.

While at it, switch to ida_alloc_XXX()/ida_free() instead to
ida_simple_get()/ida_simple_remove().
The latter is deprecated and more verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/7f9eccd8b1fce1bac45ac9b01a78cf72f54c0a61.1644266862.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block/rnbd: client device does not care queue/rotational

On client side, the device is a network device. There is no reason
to set rotational even-if the target device on server is rotational.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://lore.kernel.org/r/20220114155855.984144-3-haris.iqbal@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block/rnbd-clt: fix CHECK:BRACES warning

This patch fix the "CHECK:BRACES: braces {} should be used on all
arms of this statement" warning from checkpatch

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://lore.kernel.org/r/20220114155855.984144-2-haris.iqbal@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: default BLOCK_LEGACY_AUTOLOAD to y

As Luis reported, losetup currently doesn't properly create the loop
device without this if the device node already exists because old
scripts created it manually. So default to y for now and remove the
aggressive removal schedule.

Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220225181440.1351591-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-5.18/io_uring' into for-next

* for-5.18/io_uring:
  io_uring: documentation fixup
  io_uring: do not recalculate ppos unnecessarily
  io_uring: update kiocb->ki_pos at execution time
  io_uring: remove duplicated calls to io_kiocb_ppos
  io_uring: Remove unneeded test in io_run_task_work_sig()
  io-uring: Make tracepoints consistent.
  io-uring: add __fill_cqe function
  io-wq: use IO_WQ_ACCT_NR rather than hardcoded number
  io-wq: reduce acct->lock crossing functions lock/unlock
  io-wq: decouple work_list protection from the big wqe->lock
  io_uring: Fix use of uninitialized ret in io_eventfd_register()
  io_uring: remove ring quiesce for io_uring_register
  io_uring: avoid ring quiesce while registering restrictions and enabling rings
  io_uring: avoid ring quiesce while registering async eventfd
  io_uring: avoid ring quiesce while registering/unregistering eventfd
  io_uring: remove trace for eventfd

io-uring: Make statx API stable

One of the key architectual tenets is to keep the parameters for
io-uring stable. After the call has been submitted, its value can
be changed. Unfortunaltely this is not the case for the current statx
implementation.

IO-Uring change:
This changes replaces the const char * filename pointer in the io_statx
structure with a struct filename *. In addition it also creates the
filename object during the prepare phase.

With this change, the opcode also needs to invoke cleanup, so the
filename object gets freed after processing the request.

fs change:
This replaces the const char* __user filename parameter in the two
functions do_statx and vfs_statx with a struct filename *. In addition
to be able to correctly construct a filename object a new helper
function getname_statx_lookup_flags is introduced. The function makes
sure that do_statx and vfs_statx is invoked with the correct lookup flags.

Signed-off-by: Stefan Roesch <shr@fb.com>
Link: https://lore.kernel.org/r/20220225185326.1373304-2-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'char-misc-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a few small driver fixes for 5.17-rc6 for reported issues.

  The majority of these are IIO fixes for small things, and the other
  two are a mvmem and mtd core conflict fix.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios property
  nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property
  iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
  iio: Fix error handling for PM
  iio: addac: ad74413r: correct comparator gpio getters mask usage
  iio: addac: ad74413r: use ngpio size when iterating over mask
  iio: addac: ad74413r: Do not reference negative array offsets
  iio: adc: men_z188_adc: Fix a resource leak in an error handling path
  iio: frequency: admv1013: remove the always true condition
  iio: accel: fxls8962af: add padding to regmap for SPI
  iio:imu:adis16480: fix buffering for devices with no burst mode
  iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
  iio: adc: tsc2046: fix memory corruption by preventing array overflow

Merge tag 'driver-core-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
"Here is a single driver core fix for 5.17-rc6. It resolves a reported
  problem when the DMA map of a device is not properly released.

  It has been in linux-next with no reported problems"

* tag 'driver-core-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Free DMA range map when device is released

Merge tag 'staging-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fix from Greg KH:
"Here is a single staging driver fix for 5.17-rc6.

  It resolves a reported problem in the fbtft fb_st7789v.c driver that
  could cause the display to be flipped in cold weather.

  It has been in linux-next with no reported problems"

* tag 'staging-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: fbtft: fb_st7789v: reset display before initialization

Merge tag 'tty-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
"Here are some small n_gsm and sc16is7xx serial driver fixes for
  5.17-rc6.

  The n_gsm fixes are from Siemens as it seems they are using the line
  discipline and fixing up a number of issues they found in their
  testing. The sc16is7xx serial driver fix is for a reported problem
  with that chip.

  All of these have been in linux-next with no reported problems"

* tag 'tty-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  sc16is7xx: Fix for incorrect data being transmitted
  tty: n_gsm: fix deadlock in gsmtty_open()
  tty: n_gsm: fix wrong modem processing in convergence layer type 2
  tty: n_gsm: fix wrong tty control line for flow control
  tty: n_gsm: fix NULL pointer access due to DLCI release
  tty: n_gsm: fix proper link termination after failed open
  tty: n_gsm: fix encoding of command/response bit
  tty: n_gsm: fix encoding of control signal octet bit DV

Merge tag 'usb-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.17-rc6 to resolve
  reported problems and add new device ids. They include:

   - dwc3:
      - device mapping fix
      - new device ids
      - driver fixes

   - xhci driver fixes

   - gadget driver fixes

   - usb-serial driver device id updates

  All of these have been in linux-next with no reported problems"

* tag 'usb-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: gadget: rndis: add spinlock for rndis response list
  usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
  USB: gadget: validate endpoint index for xilinx udc
  USB: serial: option: add Telit LE910R1 compositions
  USB: serial: option: add support for DW5829e
  Revert "USB: serial: ch341: add new Product ID for CH341A"
  usb: dwc2: drd: fix soft connect when gadget is unconfigured
  usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
  tps6598x: clear int mask on probe failure
  xhci: Prevent futile URB re-submissions due to incorrect return value.
  xhci: re-initialize the HC during resume if HCE was set
  usb: dwc3: pci: Add "snps,dis_u2_susphy_quirk" for Intel Bay Trail
  usb: dwc3: pci: add support for the Intel Raptor Lake-S

Merge tag 'ata-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:
"Two fixes for the pata_hpt37x driver, both from Sergey:

   - Fix a PCI register access using an incorrect size (8bits instead of
     16bits)

   - Make sure to always disable the primary channel as it is unused"

* tag 'ata-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: pata_hpt37x: disable primary channel on HPT371
  ata: pata_hpt37x: fix PCI clock detection

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A couple driver fixes in the clk subsystem

   - Fix a hang due to bad clk parent in the Ingenic jz4725b driver

   - Fix SD controllers on Qualcomm MSM8994 SoCs by removing clks that
     shouldn't be touched"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: jz4725b: fix mmc0 clock gating
  clk: qcom: gcc-msm8994: Remove NoC clocks

Merge tag 'drm-fixes-2022-02-25' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Regular drm fixes pull, i915, amdgpu and tegra mostly, all pretty
  small.

  core:
   - edid: Always set RGB444

  tegra:
   - tegra186 suspend/resume fixes
   - syncpoint wait fix
   - build warning fix
   - eDP on older devices fix

  amdgpu:
   - Display FP fix
   - PCO powergating fix
   - RDNA2 OEM SKU stability fixes
   - Display PSR fix
   - PCI ASPM fix
   - Display link encoder fix for TEST_COMMIT
   - Raven2 suspend/resume fix
   - Fix a regression in virtual display support
   - GPUVM eviction fix

  i915:
   - Fix QGV handling on ADL-P+
   - Fix bw atomic check when switching between SAGV vs. no SAGV
   - Disconnect PHYs left connected by BIOS on disabled ports
   - Fix SAVG to no SAGV transitions on TGL+
   - Print PHY name properly on calibration error (DG2)

  imx:
   - dcss: Select GEM CMA helpers

  radeon:
   - Fix some variables's type

  vc4:
   - Fix codec cleanup
   - Fix PM reference counting"

* tag 'drm-fixes-2022-02-25' of git://anongit.freedesktop.org/drm/drm: (24 commits)
  drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
  drm/amdgpu: bypass tiling flag check in virtual display case (v2)
  Revert "drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()"
  drm/amdgpu: do not enable asic reset for raven2
  drm/amd/display: Fix stream->link_enc unassigned during stream removal
  drm/amd: Check if ASPM is enabled from PCIe subsystem
  drm/edid: Always set RGB444
  drm/tegra: dpaux: Populate AUX bus
  drm/radeon: fix variable type
  drm/amd/display: For vblank_disable_immediate, check PSR is really used
  drm/amd/pm: fix some OEM SKU specific stability issues
  drm/amdgpu: disable MMHUB PG for Picasso
  drm/amd/display: Protect update_bw_bounding_box FPU code.
  drm/i915/dg2: Print PHY name properly on calibration error
  drm/i915: Fix bw atomic check when switching between SAGV vs. no SAGV
  drm/i915: Correctly populate use_sagv_wm for all pipes
  drm/i915: Disconnect PHYs left connected by BIOS on disabled ports
  drm/i915: Widen the QGV point mask
  drm/imx/dcss: i.MX8MQ DCSS select DRM_GEM_CMA_HELPER
  drm/vc4: crtc: Fix runtime_pm reference counting
  ...

Merge tag 'perf-tools-fixes-for-v5.17-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix double free in in the error path when opening perf.data from
   multiple files in a directory instead of from a single file

- Sync the msr-index.h copy with the kernel sources

- Fix error when printing 'weight' field in 'perf script'

- Skip failing sigtrap test for arm+aarch64 in 'perf test'

- Fix failure to use a cpu list for uncore events in hybrid systems,
   e.g. Intel Alder Lake

* tag 'perf-tools-fixes-for-v5.17-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf script: Fix error when printing 'weight' field
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf data: Fix double free in perf_session__delete()
  perf evlist: Fix failed to use cpu list for uncore events
  perf test: Skip failing sigtrap test for arm+aarch64

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"x86 host:

   - Expose KVM_CAP_ENABLE_CAP since it is supported

   - Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode

   - Ensure async page fault token is nonzero

   - Fix lockdep false negative

   - Fix FPU migration regression from the AMX changes

  x86 guest:

   - Don't use PV TLB/IPI/yield on uniprocessor guests

  PPC:

   - reserve capability id (topic branch for ppc/kvm)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled
  KVM: x86/mmu: make apf token non-zero to fix bug
  KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3
  x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
  x86/kvm: Fix compilation warning in non-x86_64 builds
  x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
  x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
  KVM: Fix lockdep false negative during host resume
  KVM: x86: Add KVM_CAP_ENABLE_CAP to x86

Merge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fixes from Bjorn Helgaas:

- Fix a merge error that broke PCI device enumeration on mvebu
   platforms, including Turris Omnia (Armada 385) (Pali Rohár)

- Avoid using ATS on all AMD Navi10 and Navi14 GPUs because some
   VBIOSes don't account for "harvested" (disabled) parts of the chip
   when initializing caches (Alex Deucher)

* tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
  PCI: mvebu: Fix device enumeration regression

Merge tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.

  Current release - regressions:

   - bpf: fix crash due to out of bounds access into reg2btf_ids

   - mvpp2: always set port pcs ops, avoid null-deref

   - eth: marvell: fix driver load from initrd

   - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"

  Current release - new code bugs:

   - mptcp: fix race in overlapping signal events

  Previous releases - regressions:

   - xen-netback: revert hotplug-status changes causing devices to not
     be configured

   - dsa:
      - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
        held
      - fix panic when removing unoffloaded port from bridge

   - dsa: microchip: fix bridging with more than two member ports

  Previous releases - always broken:

   - bpf:
      - fix crash due to incorrect copy_map_value when both spin lock
        and timer are present in a single value
      - fix a bpf_timer initialization issue with clang
      - do not try bpf_msg_push_data with len 0
      - add schedule points in batch ops

   - nf_tables:
      - unregister flowtable hooks on netns exit
      - correct flow offload action array size
      - fix a couple of memory leaks

   - vsock: don't check owner in vhost_vsock_stop() while releasing

   - gso: do not skip outer ip header in case of ipip and net_failover

   - smc: use a mutex for locking "struct smc_pnettable"

   - openvswitch: fix setting ipv6 fields causing hw csum failure

   - mptcp: fix race in incoming ADD_ADDR option processing

   - sysfs: add check for netdevice being present to speed_show

   - sched: act_ct: fix flow table lookup after ct clear or switching
     zones

   - eth: intel: fixes for SR-IOV forwarding offloads

   - eth: broadcom: fixes for selftests and error recovery

   - eth: mellanox: flow steering and SR-IOV forwarding fixes

  Misc:

   - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
     friends not report freed skbs as drops

   - force inlining of checksum functions in net/checksum.h"

* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: mv643xx_eth: process retval from of_get_mac_address
  ping: remove pr_err from ping_lookup
  Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
  openvswitch: Fix setting ipv6 fields causing hw csum failure
  ipv6: prevent a possible race condition with lifetimes
  net/smc: Use a mutex for locking "struct smc_pnettable"
  bnx2x: fix driver load from initrd
  Revert "xen-netback: Check for hotplug-status existence before watching"
  Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
  net/mlx5e: Fix VF min/max rate parameters interchange mistake
  net/mlx5e: Add missing increment of count
  net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
  net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
  net/mlx5e: Add feature check for set fec counters
  net/mlx5e: TC, Skip redundant ct clear actions
  net/mlx5e: TC, Reject rules with forward and drop actions
  net/mlx5e: TC, Reject rules with drop and modify hdr action
  net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
  net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
  net/mlx5: Fix possible deadlock on rule deletion
  ...

Merge tag 'drm-intel-fixes-2022-02-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix QGV handling on ADL-P+ (Ville Syrjälä)
- Fix bw atomic check when switching between SAGV vs. no SAGV (Ville Syrjälä)
- Disconnect PHYs left connected by BIOS on disabled ports (Imre Deak)
- Fix SAVG to no SAGV transitions on TGL+ (Ville Syrjälä)
- Print PHY name properly on calibration error (DG2) (Matt Roper)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YhdyHwRWkOTWwlqi@tursulin-mobl2

Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request:
    - send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
    - fix passthrough to namespaces with unsupported features (Christoph
      Hellwig)

- Clear iocb->private at poll completion (Stefano)

* tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
  nvme-tcp: send H2CData PDUs based on MAXH2CDATA
  nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
  nvme: don't return an error from nvme_configure_metadata
  block: clear iocb->private in blkdev_bio_end_io_async()

Merge tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

- Add a conditional schedule point in io_add_buffers() (Eric)

- Fix for a quiesce speedup merged in this release (Dylan)

- Don't convert to jiffies for event timeout waiting, it's way too
   coarse when we accept a timespec as input (me)

* tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block:
  io_uring: disallow modification of rsrc_data during quiesce
  io_uring: don't convert to jiffies for waiting on timeouts
  io_uring: add a schedule point in io_add_buffers()

Merge tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull more x86 platform driver fixes from Hans de Goede:
"Two more fixes:

   - Fix suspend/resume regression on AMD Cezanne APUs in >= 5.16

   - Fix Microsoft Surface 3 battery readings"

* tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  surface: surface3_power: Fix battery readings on batteries without a serial number
  platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup

net: mv643xx_eth: process retval from of_get_mac_address

Obtaining a MAC address may be deferred in cases when the MAC is stored
in an NVMEM block, for example, and it may not be ready upon the first
retrieval attempt and return EPROBE_DEFER.

It is also possible that a port that does not rely on NVMEM has been
already created when getting the defer request. Thus, also the resources
allocated previously must be freed when doing a roll-back.

Fixes: 76723bca2802 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Mauri Sandberg <maukka@ext.kapsi.fi>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fi
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled

If nested tsc scaling is disabled, MSR_AMD64_TSC_RATIO should
never have non default value.

Due to way nested tsc scaling support was implmented in qemu,
it would set this msr to 0 when nested tsc scaling was disabled.
Ignore that value for now, as it causes no harm.

Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling")
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220223115649.319134-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: make apf token non-zero to fix bug

In current async pagefault logic, when a page is ready, KVM relies on
kvm_arch_can_dequeue_async_page_present() to determine whether to deliver
a READY event to the Guest. This function test token value of struct
kvm_vcpu_pv_apf_data, which must be reset to zero by Guest kernel when a
READY event is finished by Guest. If value is zero meaning that a READY
event is done, so the KVM can deliver another.
But the kvm_arch_setup_async_pf() may produce a valid token with zero
value, which is confused with previous mention and may lead the loss of
this READY event.

This bug may cause task blocked forever in Guest:
INFO: task stress:7532 blocked for more than 1254 seconds.
       Not tainted 5.10.0 #16
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:stress          state:D stack:    0 pid: 7532 ppid:  1409
flags:0x00000080
Call Trace:
  __schedule+0x1e7/0x650
  schedule+0x46/0xb0
  kvm_async_pf_task_wait_schedule+0xad/0xe0
  ? exit_to_user_mode_prepare+0x60/0x70
  __kvm_handle_async_pf+0x4f/0xb0
  ? asm_exc_page_fault+0x8/0x30
  exc_page_fault+0x6f/0x110
  ? asm_exc_page_fault+0x8/0x30
  asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x402d00
RSP: 002b:00007ffd31912500 EFLAGS: 00010206
RAX: 0000000000071000 RBX: ffffffffffffffff RCX: 00000000021a32b0
RDX: 000000000007d011 RSI: 000000000007d000 RDI: 00000000021262b0
RBP: 00000000021262b0 R08: 0000000000000003 R09: 0000000000000086
R10: 00000000000000eb R11: 00007fefbdf2baa0 R12: 0000000000000000
R13: 0000000000000002 R14: 000000000007d000 R15: 0000000000001000

Signed-off-by: Liang Zhang <zhangliang5@huawei.com>
Message-Id: <20220222031239.1076682-1-zhangliang5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ping: remove pr_err from ping_lookup

As Jakub noticed, prints should be avoided on the datapath.
Also, as packets would never come to the else branch in
ping_lookup(), remove pr_err() from ping_lookup().

Fixes: 35a79e64de29 ("ping: fix the dif and sdif check in ping_lookup")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/1ef3f2fcd31bd681a193b1fcf235eee1603819bd.1645674068.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"

Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.

Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391

This reverts commit 3d2504663c41104b4359a15f35670cfa82de1bbf.

Fixes: 3d2504663c41 (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

openvswitch: Fix setting ipv6 fields causing hw csum failure

Ipv6 ttl, label and tos fields are modified without first
pulling/pushing the ipv6 header, which would have updated
the hw csum (if available). This might cause csum validation
when sending the packet to the stack, as can be seen in
the trace below.

Fix this by updating skb->csum if available.

Trace resulted by ipv6 ttl dec and then sending packet
to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
[295241.900063] s_pf0vf2: hw csum failure
[295241.923191] Call Trace:
[295241.925728]  <IRQ>
[295241.927836]  dump_stack+0x5c/0x80
[295241.931240]  __skb_checksum_complete+0xac/0xc0
[295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
[295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
[295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
[295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
[295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
[295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
[295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
[295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
[295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
[295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
[295242.047498]  netif_receive_skb_internal+0x3e/0xc0
[295242.052291]  napi_gro_receive+0xba/0xe0
[295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
[295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
[295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
[295242.077958]  net_rx_action+0x149/0x3b0
[295242.086762]  __do_softirq+0xd7/0x2d6
[295242.090427]  irq_exit+0xf7/0x100
[295242.093748]  do_IRQ+0x7f/0xd0
[295242.096806]  common_interrupt+0xf/0xf
[295242.100559]  </IRQ>
[295242.102750] RIP: 0033:0x7f9022e88cbd
[295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
[295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
[295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
[295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
[295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40

Fixes: 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: prevent a possible race condition with lifetimes

valid_lft, prefered_lft and tstamp are always accessed under the lock
"lock" in other places. Reading these without taking the lock may result
in inconsistencies regarding the calculation of the valid and preferred
variables since decisions are taken on these fields for those variables.

Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Niels Dossche <niels.dossche@ugent.be>
Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/smc: Use a mutex for locking "struct smc_pnettable"

smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
which, in turn, calls mutex_lock(&smc_ib_devices.mutex).

read_lock() disables preemption. Therefore, the code acquires a mutex while in
atomic context and it leads to a SAC bug.

Fix this bug by replacing the rwlock with a mutex.

Reported-and-tested-by: syzbot+4f322a6d84e991c38775@syzkaller.appspotmail.com
Fixes: 64e28b52c7a6 ("net/smc: add pnet table namespace support")
Confirmed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220223100252.22562-1-fmdefrancesco@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnx2x: fix driver load from initrd

Commit b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0") added
new firmware support in the driver with maintaining older firmware
compatibility. However, older firmware was not added in MODULE_FIRMWARE()
which caused missing firmware files in initrd image leading to driver load
failure from initrd. This patch adds MODULE_FIRMWARE() for older firmware
version to have firmware files included in initrd.

Fixes: b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215627
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220223085720.12021-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "xen-netback: Check for hotplug-status existence before watching"

This reverts commit 2afeec08ab5c86ae21952151f726bfe184f6b23d.

The reasoning in the commit was wrong - the code expected to setup the
watch even if 'hotplug-status' didn't exist. In fact, it relied on the
watch being fired the first time - to check if maybe 'hotplug-status' is
already set to 'connected'. Not registering a watch for non-existing
path (which is the case if hotplug script hasn't been executed yet),
made the backend not waiting for the hotplug script to execute. This in
turns, made the netfront think the interface is fully operational, while
in fact it was not (the vif interface on xen-netback side might not be
configured yet).

This was a workaround for 'hotplug-status' erroneously being removed.
But since that is reverted now, the workaround is not necessary either.

More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"

This reverts commit 1f2565780e9b7218cf92c7630130e82dcc0fe9c2.

The 'hotplug-status' node should not be removed as long as the vif
device remains configured. Otherwise the xen-netback would wait for
re-running the network script even if it was already called (in case of
the frontent re-connecting). But also, it _should_ be removed when the
vif device is destroyed (for example when unbinding the driver) -
otherwise hotplug script would not configure the device whenever it
re-appear.

Moving removal of the 'hotplug-status' node was a workaround for nothing
calling network script after xen-netback module is reloaded. But when
vif interface is re-created (on xen-netback unbind/bind for example),
the script should be called, regardless of who does that - currently
this case is not handled by the toolstack, and requires manual
script call. Keeping hotplug-status=connected to skip the call is wrong
and leads to not configured interface.

More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

io_uring: documentation fixup

Fix incorrect name reference in comment. ki_filp does not exist in the
struct, but file does.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220224105157.1332353-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme into block-5.17

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.17

- send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
- fix passthrough to namespaces with unsupported features (me)"

* tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme:
  nvme-tcp: send H2CData PDUs based on MAXH2CDATA
  nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
  nvme: don't return an error from nvme_configure_metadata

Merge tag 'usb-serial-5.17-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial fixes for 5.17-rc6

Here's a revert of a commit which erroneously added a device id used for
the EPP/MEM mode of ch341 devices.

Included are also some new modem device ids.

All have been in linux-next with no reported issues.

* tag 'usb-serial-5.17-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: option: add Telit LE910R1 compositions
  USB: serial: option: add support for DW5829e
  Revert "USB: serial: ch341: add new Product ID for CH341A"

surface: surface3_power: Fix battery readings on batteries without a serial number

The battery on the 2nd hand Surface 3 which I recently bought appears to
not have a serial number programmed in. This results in any I2C reads from
the registers containing the serial number failing with an I2C NACK.

This was causing mshw0011_bix() to fail causing the battery readings to
not work at all.

Ignore EREMOTEIO (I2C NACK) errors when retrieving the serial number and
continue with an empty serial number to fix this.

Fixes: b1f81b496b0d ("platform/x86: surface3_power: MSHW0011 rev-eng implementation")
BugLink: https://github.com/linux-surface/linux-surface/issues/608
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220224101848.7219-1-hdegoede@redhat.com

platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup

commit 59348401ebed ("platform/x86: amd-pmc: Add special handling for
timer based S0i3 wakeup") adds support for using another platform timer
in lieu of the RTC which doesn't work properly on some systems. This path
was validated and worked well before submission. During the 5.16-rc1 merge
window other patches were merged that caused this to stop working properly.

When this feature was used with 5.16-rc1 or later some OEM laptops with the
matching firmware requirements from that commit would shutdown instead of
program a timer based wakeup.

This was bisected to commit 8d89835b0467 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). This wasn't supposed to cause any
negative impacts and also tested well on both Intel and ARM platforms.
However this changed the semantics of when CPUs are allowed to be in the
deepest state. For the AMD systems in question it appears this causes a
firmware crash for timer based wakeup.

It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
and all the CPUs going into a deep state while the timer is still being
programmed. It's likely a firmware bug, but to avoid it don't allow setting
CPUs into the deepest state while using CZN timer wakeup path.

If later it's discovered that this also occurs from "regular" suspends
without a timer as well or on other silicon, this may be later expanded to
run in the suspend path for more scenarios.

Cc: stable@vger.kernel.org # 5.16+
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@BL1PR12MB5157.namprd12.prod.outlook.com/T/#mee35f39c41a04b624700ab2621c795367f19c90e
Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Fixes: 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
Fixes: 59348401ebed ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup"
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220223175237.6209-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

usb: gadget: rndis: add spinlock for rndis response list

There's no lock for rndis response list. It could cause list corruption
if there're two different list_add at the same time like below.
It's better to add in rndis_add_response / rndis_free_response
/ rndis_get_next_response to prevent any race condition on response list.

[  361.894299] [1:   irq/191-dwc3:16979] list_add corruption.
next->prev should be prev (ffffff80651764d0),
but was ffffff883dc36f80. (next=ffffff80651764d0).

[  361.904380] [1:   irq/191-dwc3:16979] Call trace:
[  361.904391] [1:   irq/191-dwc3:16979]  __list_add_valid+0x74/0x90
[  361.904401] [1:   irq/191-dwc3:16979]  rndis_msg_parser+0x168/0x8c0
[  361.904409] [1:   irq/191-dwc3:16979]  rndis_command_complete+0x24/0x84
[  361.904417] [1:   irq/191-dwc3:16979]  usb_gadget_giveback_request+0x20/0xe4
[  361.904426] [1:   irq/191-dwc3:16979]  dwc3_gadget_giveback+0x44/0x60
[  361.904434] [1:   irq/191-dwc3:16979]  dwc3_ep0_complete_data+0x1e8/0x3a0
[  361.904442] [1:   irq/191-dwc3:16979]  dwc3_ep0_interrupt+0x29c/0x3dc
[  361.904450] [1:   irq/191-dwc3:16979]  dwc3_process_event_entry+0x78/0x6cc
[  361.904457] [1:   irq/191-dwc3:16979]  dwc3_process_event_buf+0xa0/0x1ec
[  361.904465] [1:   irq/191-dwc3:16979]  dwc3_thread_interrupt+0x34/0x5c

Fixes: f6281af9d62e ("usb: gadget: rndis: use list_for_each_entry_safe")
Cc: stable <stable@kernel.org>
Signed-off-by: Daehwan Jung <dh10.jung@samsung.com>
Link: https://lore.kernel.org/r/1645507768-77687-1-git-send-email-dh10.jung@samsung.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: dwc3: gadget: Let the interrupt handler disable bottom halves.

The interrupt service routine registered for the gadget is a primary
handler which mask the interrupt source and a threaded handler which
handles the source of the interrupt. Since the threaded handler is
voluntary threaded, the IRQ-core does not disable bottom halves before
invoke the handler like it does for the forced-threaded handler.

Due to changes in networking it became visible that a network gadget's
completions handler may schedule a softirq which remains unprocessed.
The gadget's completion handler is usually invoked either in hard-IRQ or
soft-IRQ context. In this context it is enough to just raise the softirq
because the softirq itself will be handled once that context is left.
In the case of the voluntary threaded handler, there is nothing that
will process pending softirqs. Which means it remain queued until
another random interrupt (on this CPU) fires and handles it on its exit
path or another thread locks and unlocks a lock with the bh suffix.
Worst case is that the CPU goes idle and the NOHZ complains about
unhandled softirqs.

Disable bottom halves before acquiring the lock (and disabling
interrupts) and enable them after dropping the lock. This ensures that
any pending softirqs will handled right away.

Link: https://lkml.kernel.org/r/c2a64979-73d1-2c22-e048-c275c9f81558@samsung.com
Fixes: e5f68b4a3e7b0 ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"")
Cc: stable <stable@kernel.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/Yg/YPejVQH3KkRVd@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: gadget: validate endpoint index for xilinx udc

Assure that host may not manipulate the index to point
past endpoint array.

Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-02-22

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

* tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Fix VF min/max rate parameters interchange mistake
  net/mlx5e: Add missing increment of count
  net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
  net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
  net/mlx5e: Add feature check for set fec counters
  net/mlx5e: TC, Skip redundant ct clear actions
  net/mlx5e: TC, Reject rules with forward and drop actions
  net/mlx5e: TC, Reject rules with drop and modify hdr action
  net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
  net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
  net/mlx5: Fix possible deadlock on rule deletion
  net/mlx5: Fix tc max supported prio for nic mode
  net/mlx5: Fix wrong limitation of metadata match on ecpf
  net/mlx5: Update log_max_qp value to be 17 at most
  net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
  net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
  net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte
  net/mlx5: DR, Cache STE shadow memory
  net/mlx5: Update the list of the PCI supported devices
====================

Link: https://lore.kernel.org/r/20220224001123.365265-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'amd-drm-fixes-5.17-2022-02-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.17-2022-02-23:

amdgpu:
- Display FP fix
- PCO powergating fix
- RDNA2 OEM SKU stability fixes
- Display PSR fix
- PCI ASPM fix
- Display link encoder fix for TEST_COMMIT
- Raven2 suspend/resume fix
- Fix a regression in virtual display support
- GPUVM eviction fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223214623.28823-1-alexander.deucher@amd.com

Merge tag 'drm/tegra/for-5.17-rc6' of https://gitlab.freedesktop.org/drm/tegra into drm-fixes

drm/tegra: Fixes for v5.17-rc6

Contains a couple of fixes for Tegra186 suspend/resume, syncpoint
waiting, a build warning and eDP on older Tegra devices.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223161903.293392-1-thierry.reding@gmail.com

Merge tag 'drm-misc-fixes-2022-02-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

* edid: Always set RGB444
* imx/dcss: Select GEM CMA helpers
* radeon: Fix some variables's type
* vc4: Fix codec cleanup; Fix PM reference counting

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YhaKj4zWJ42YWRts@linux-uq9g.fritz.box

Merge tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Update some maintainers email addresses

- Fix handling of elfcorehdr reservation for crash dump kernel

- Fix unittest expected warnings text

* tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: update Roger Quadros email
  MAINTAINERS: sifive: drop Yash Shah
  of/fdt: move elfcorehdr reservation early for crash dump kernel
  of: unittest: update text of expected warnings

Merge tag 'selinux-pr-20220223' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
"A second small SELinux fix which addresses an incorrect
mutex_is_locked() check"

* tag 'selinux-pr-20220223' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix misuse of mutex_is_locked()

net/mlx5e: Fix VF min/max rate parameters interchange mistake

The VF min and max rate were passed incorrectly and resulted in wrongly
interchanging them. Fix the order of parameters in
mlx5_esw_qos_set_vport_rate().

Fixes: d7df09f5e7b4 ("net/mlx5: E-switch, Enable vport QoS on demand")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add missing increment of count

Add mistakenly missing increment of count variable when looping over
output buffer in mlx5e_self_test().

This resolves the issue of garbage values output when querying with self
test via ethtool.

before:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test        0
Speed Test       1768697188
Health Test      758528120
Loopback Test    3288687

after:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test        0
Speed Test       0
Health Test      0
Loopback Test    0

Fixes: 7990b1b5e8bd ("net/mlx5e: loopback test is not supported in switchdev mode")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: MPLSoUDP decap, fix check for unsupported matches

Currently offload of rule on bareudp device require tunnel key
in order to match on mpls fields and without it the mpls fields
are ignored, this is incorrect due to the fact udp tunnel doesn't
have key to match on.

Fix by returning error in case flow is matching on tunnel key.

Fixes: 72046a91d134 ("net/mlx5e: Allow to match on mpls parameters")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix MPLSoUDP encap to use MPLS action information

Currently the MPLSoUDP encap builds the MPLS header using encap action
information (tunnel id, ttl and tos) instead of the MPLS action
information (label, ttl, tc and bos) which is wrong.

Fix by storing the MPLS action information during the flow action
parse and later using it to create the encap MPLS header.

Fixes: f828ca6a2fb6 ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add feature check for set fec counters

Fec counters support is checked via the PCAM feature_cap_mask,
bit 0: PPCNT_counter_group_Phy_statistical_counter_group.
Add feature check to avoid faulty behavior.

Fixes: 0a1498ebfa55 ("net/mlx5e: Expose FEC counters via ethtool")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Skip redundant ct clear actions

Offload of ct clear action is just resetting the reg_c register.
It's done by allocating modify hdr resources which is limited.
Doing it multiple times is redundant and wasting modify hdr resources
and if resources depleted the driver will fail offloading the rule.
Ignore redundant ct clear actions after the first one.

Fixes: 806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Reject rules with forward and drop actions

Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.

Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Reject rules with drop and modify hdr action

This kind of action is not supported by firmware and generates a
syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets

For RX TLS device-offloaded packets, the HW spec guarantees checksum
validation for the offloaded packets, but does not define whether the
CQE.checksum field matches the original packet (ciphertext) or
the decrypted one (plaintext). This latitude allows architetctural
improvements between generations of chips, resulting in different decisions
regarding the value type of CQE.checksum.

Hence, for these packets, the device driver should not make use of this CQE
field. Here we block CHECKSUM_COMPLETE usage for RX TLS device-offloaded
packets, and use CHECKSUM_UNNECESSARY instead.

Value of the packet's tcp_hdr.csum is not modified by the HW, and it always
matches the original ciphertext.

Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix wrong return value on ioctl EEPROM query failure

The ioctl EEPROM query wrongly returns success on read failures, fix
that by returning the appropriate error code.

Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Fix possible deadlock on rule deletion

Add missing call to up_write_ref_node() which releases the semaphore
in case the FTE doesn't have destinations, such in drop rule case.

Fixes: 465e7baab6d9 ("net/mlx5: Fix deletion of duplicate rules")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Fix tc max supported prio for nic mode

Only prio 1 is supported if firmware doesn't support ignore flow
level for nic mode. The offending commit removed the check wrongly.
Add it back.

Fixes: 9a99c8f1253a ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Fix wrong limitation of metadata match on ecpf

Match metadata support check returns false for ecpf device.
However, this support does exist for ecpf and therefore this
limitation should be removed to allow feature such as stacked
devices and internal port offloaded to be supported.

Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Update log_max_qp value to be 17 at most

Currently, log_max_qp value is dependent on what FW reports as its max capability.
In reality, due to a bug, some FWs report a value greater than 17, even though they
don't support log_max_qp > 17.

This FW issue led the driver to exhaust memory on startup.
Thus, log_max_qp value is set to be no more than 17 regardless
of what FW reports, as it was before the cited commit.

Fixes: f79a609ea6bf ("net/mlx5: Update log_max_qp value to FW max capability")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Fix the threshold that defines when pool sync is initiated

When deciding whether to start syncing and actually free all the "hot"
ICM chunks, we need to consider the type of the ICM chunks that we're
dealing with. For instance, the amount of available ICM for MODIFY_ACTION
is significantly lower than the usual STE ICM, so the threshold should
account for that - otherwise we can deplete MODIFY_ACTION memory just by
creating and deleting the same modify header action in a continuous loop.

This patch replaces the hard-coded threshold with a dynamic value.

Fixes: 1c58651412bb ("net/mlx5: DR, ICM memory pools sync optimization")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version

Currently SMFS allows adding rule with matching on src/dst IP w/o matching
on full ethertype or ip_version, which is not supported by HW.
This patch fixes this issue and adds the check as it is done in DMFS.

Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte

When adding a rule with 32 destinations, we hit the following out-of-band
access issue:

BUG: KASAN: slab-out-of-bounds in mlx5_cmd_dr_create_fte+0x18ee/0x1e70

This patch fixes the issue by both increasing the allocated buffers to
accommodate for the needed actions and by checking the number of actions
to prevent this issue when a rule with too many actions is provided.

Fixes: 1ffd498901c1 ("net/mlx5: DR, Increase supported num of actions to 32")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>