]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
12 months agoslab: create kmem_cache_create() compatibility layer
Christian Brauner [Thu, 5 Sep 2024 07:56:55 +0000 (09:56 +0200)]
slab: create kmem_cache_create() compatibility layer

Use _Generic() to create a compatibility layer that type switches on the
third argument to either call __kmem_cache_create() or
__kmem_cache_create_args(). If NULL is passed for the struct
kmem_cache_args argument use default args making porting for callers
that don't care about additional arguments easy.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args
Christian Brauner [Thu, 5 Sep 2024 07:56:54 +0000 (09:56 +0200)]
slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args

Make KMEM_CACHE_USERCOPY() use struct kmem_cache_args.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: port KMEM_CACHE() to struct kmem_cache_args
Christian Brauner [Thu, 5 Sep 2024 07:56:53 +0000 (09:56 +0200)]
slab: port KMEM_CACHE() to struct kmem_cache_args

Make KMEM_CACHE() use struct kmem_cache_args.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: remove rcu_freeptr_offset from struct kmem_cache
Christian Brauner [Thu, 5 Sep 2024 07:56:52 +0000 (09:56 +0200)]
slab: remove rcu_freeptr_offset from struct kmem_cache

Pass down struct kmem_cache_args to calculate_sizes() so we can use
args->{use}_freeptr_offset directly. This allows us to remove
->rcu_freeptr_offset from struct kmem_cache.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: pass struct kmem_cache_args to do_kmem_cache_create()
Christian Brauner [Thu, 5 Sep 2024 07:56:51 +0000 (09:56 +0200)]
slab: pass struct kmem_cache_args to do_kmem_cache_create()

and initialize most things in do_kmem_cache_create(). In a follow-up
patch we'll remove rcu_freeptr_offset from struct kmem_cache.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: pull kmem_cache_open() into do_kmem_cache_create()
Christian Brauner [Thu, 5 Sep 2024 07:56:50 +0000 (09:56 +0200)]
slab: pull kmem_cache_open() into do_kmem_cache_create()

do_kmem_cache_create() is the only caller and we're going to pass down
struct kmem_cache_args in a follow-up patch.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: pass struct kmem_cache_args to create_cache()
Christian Brauner [Thu, 5 Sep 2024 07:56:49 +0000 (09:56 +0200)]
slab: pass struct kmem_cache_args to create_cache()

Pass struct kmem_cache_args to create_cache() so that we can later
simplify further helpers.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: port kmem_cache_create_usercopy() to struct kmem_cache_args
Christian Brauner [Thu, 5 Sep 2024 07:56:48 +0000 (09:56 +0200)]
slab: port kmem_cache_create_usercopy() to struct kmem_cache_args

Port kmem_cache_create_usercopy() to struct kmem_cache_args and remove
the now unused do_kmem_cache_create_usercopy() helper.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: port kmem_cache_create_rcu() to struct kmem_cache_args
Christian Brauner [Thu, 5 Sep 2024 07:56:47 +0000 (09:56 +0200)]
slab: port kmem_cache_create_rcu() to struct kmem_cache_args

Port kmem_cache_create_rcu() to struct kmem_cache_args.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: port kmem_cache_create() to struct kmem_cache_args
Christian Brauner [Thu, 5 Sep 2024 07:56:46 +0000 (09:56 +0200)]
slab: port kmem_cache_create() to struct kmem_cache_args

Port kmem_cache_create() to struct kmem_cache_args.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: add struct kmem_cache_args
Christian Brauner [Thu, 5 Sep 2024 07:56:45 +0000 (09:56 +0200)]
slab: add struct kmem_cache_args

Currently we have multiple kmem_cache_create*() variants that take up to
seven separate parameters with one of the functions having to grow an
eigth parameter in the future to handle both usercopy and a custom
freelist pointer.

Add a struct kmem_cache_args structure and move less common parameters
into it. Core parameters such as name, object size, and flags continue
to be passed separately.

Add a new function __kmem_cache_create_args() that takes a struct
kmem_cache_args pointer and port do_kmem_cache_create_usercopy() over to
it.

In follow-up patches we will port the other kmem_cache_create*()
variants over to it as well.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoslab: s/__kmem_cache_create/do_kmem_cache_create/g
Christian Brauner [Thu, 5 Sep 2024 07:56:44 +0000 (09:56 +0200)]
slab: s/__kmem_cache_create/do_kmem_cache_create/g

Free up reusing the double-underscore variant for follow-up patches.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoMerge branch 'vfs.file' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs into...
Vlastimil Babka [Thu, 5 Sep 2024 09:31:32 +0000 (11:31 +0200)]
Merge branch 'vfs.file' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs into slab/for-6.12/kmem_cache_args

Merge prerequisities from the vfs git tree for the following series that
introduces kmem_cache_args. The vfs.file branch includes the addition of
kmem_cache_create_rcu() which was needed in vfs for the filp cache
optimization. The following series refactors this code.

12 months agomemcg: add charging of already allocated slab objects
Shakeel Butt [Thu, 5 Sep 2024 17:34:22 +0000 (10:34 -0700)]
memcg: add charging of already allocated slab objects

At the moment, the slab objects are charged to the memcg at the
allocation time. However there are cases where slab objects are
allocated at the time where the right target memcg to charge it to is
not known. One such case is the network sockets for the incoming
connection which are allocated in the softirq context.

Couple hundred thousand connections are very normal on large loaded
server and almost all of those sockets underlying those connections get
allocated in the softirq context and thus not charged to any memcg.
However later at the accept() time we know the right target memcg to
charge. Let's add new API to charge already allocated objects, so we can
have better accounting of the memory usage.

To measure the performance impact of this change, tcp_crr is used from
the neper [1] performance suite. Basically it is a network ping pong
test with new connection for each ping pong.

The server and the client are run inside 3 level of cgroup hierarchy
using the following commands:

Server:
 $ tcp_crr -6

Client:
 $ tcp_crr -6 -c -H ${server_ip}

If the client and server run on different machines with 50 GBPS NIC,
there is no visible impact of the change.

For the same machine experiment with v6.11-rc5 as base.

          base (throughput)     with-patch
tcp_crr   14545 (+- 80)         14463 (+- 56)

It seems like the performance impact is within the noise.

Link: https://github.com/google/neper
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com> # net
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
12 months agoiomap: remove the iomap_file_buffered_write_punch_delalloc return value
Christoph Hellwig [Tue, 10 Sep 2024 04:39:07 +0000 (07:39 +0300)]
iomap: remove the iomap_file_buffered_write_punch_delalloc return value

iomap_file_buffered_write_punch_delalloc can only return errors if either
the ->punch callback returned an error, or if someone changed the API of
mapping_seek_hole_data to return a negative error code that is not
-ENXIO.

As the only instance of ->punch never returns an error, an such an error
would be fatal anyway remove the entire error propagation and don't
return an error code from iomap_file_buffered_write_punch_delalloc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240910043949.3481298-6-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoiomap: pass the iomap to the punch callback
Christoph Hellwig [Tue, 10 Sep 2024 04:39:06 +0000 (07:39 +0300)]
iomap: pass the iomap to the punch callback

XFS will need to look at the flags in the iomap structure, so pass it
down all the way to the callback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240910043949.3481298-5-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoiomap: pass flags to iomap_file_buffered_write_punch_delalloc
Christoph Hellwig [Tue, 10 Sep 2024 04:39:05 +0000 (07:39 +0300)]
iomap: pass flags to iomap_file_buffered_write_punch_delalloc

To fix short write error handling, We'll need to figure out what operation
iomap_file_buffered_write_punch_delalloc is called for.  Pass the flags
argument on to it, and reorder the argument list to match that of
->iomap_end so that the compiler only has to add the new punch argument
to the end of it instead of reshuffling the registers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240910043949.3481298-4-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoiomap: improve shared block detection in iomap_unshare_iter
Christoph Hellwig [Tue, 10 Sep 2024 04:39:04 +0000 (07:39 +0300)]
iomap: improve shared block detection in iomap_unshare_iter

Currently iomap_unshare_iter relies on the IOMAP_F_SHARED flag to detect
blocks to unshare.  This is reasonable, but IOMAP_F_SHARED is also useful
for the file system to do internal book keeping for out of place writes.
XFS used to that, until it got removed in commit 72a048c1056a
("xfs: only set IOMAP_F_SHARED when providing a srcmap to a write")
because unshare for incorrectly unshare such blocks.

Add an extra safeguard by checking the explicitly provided srcmap instead
of the fallback to the iomap for valid data, as that catches the case
where we'd just copy from the same place we'd write to easily, allowing
to reinstate setting IOMAP_F_SHARED for all XFS writes that go to the
COW fork.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240910043949.3481298-3-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoiomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
Christoph Hellwig [Tue, 10 Sep 2024 04:39:03 +0000 (07:39 +0300)]
iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release

When direct I/O completions invalidates the page cache it holds neither the
i_rwsem nor the invalidate_lock so it can be racing with
iomap_write_delalloc_release.  If the search for the end of the region that
contains data returns the start offset we hit such a race and just need to
look for the end of the newly created hole instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240910043949.3481298-2-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
12 months agoMerge branch 'net-lan966x-use-the-newly-introduced-fdma-library'
Paolo Abeni [Tue, 10 Sep 2024 09:04:18 +0000 (11:04 +0200)]
Merge branch 'net-lan966x-use-the-newly-introduced-fdma-library'

Daniel Machon says:

====================
net: lan966x: use the newly introduced FDMA library

This patch series is the second of a 2-part series [1], that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a common
library with a common implementation.  This also has the benefit of
removing a lot of open-coded bookkeeping and duplicate code for the two
drivers.

In this second series, the FDMA library will be taken into use by the
lan966x switch driver.

 ###################
 # Example of use: #
 ###################

- Initialize the rx and tx fdma structs with values for: number of
  DCB's, number of DB's, channel ID, DB size (data buffer size), and
  total size of the requested memory. Also provide two callbacks:
  nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.

- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().

- Initialize the DCB's with fdma_dcb_init().

- Add new DCB's with fdma_dcb_add().

- Free memory with fdma_free_phys() or fdma_free_coherent().

 #####################
 # Patch  breakdown: #
 #####################

Patch #1:  select FDMA library for lan966x.

Patch #2:  includes the fdma_api.h header and removes old symbols.

Patch #3:  replaces old rx and tx variables with equivalent ones from the
           fdma struct. Only the variables that can be changed without
           breaking traffic is changed in this patch.

Patch #4:  uses the library for allocation of rx buffers. This requires
           quite a bit of refactoring in this single patch.

Patch #5:  uses the library for adding DCB's in the rx path.

Patch #6:  uses the library for freeing rx buffers.

Patch #7:  uses the library for allocation of tx buffers. This requires
           quite a bit of refactoring in this single patch.

Patch #8:  uses the library for adding DCB's in the tx path.

Patch #9:  uses the library helpers in the tx path.

Patch #10: ditch last_in_use variable and use library instead.

Patch #11: uses library helpers throughout.

Patch #12: refactor lan966x_fdma_reload() function.

[1] https://lore.kernel.org/netdev/20240902-fdma-sparx5-v1-0-1e7d5e5a9f34@microchip.com/

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================

Link: https://patch.msgid.link/20240905-fdma-lan966x-v1-0-e083f8620165@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: refactor buffer reload function
Daniel Machon [Thu, 5 Sep 2024 08:06:40 +0000 (10:06 +0200)]
net: lan966x: refactor buffer reload function

Now that we store everything in the fdma structs, refactor
lan966x_fdma_reload() to store and restore the entire struct.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use a few FDMA helpers throughout
Daniel Machon [Thu, 5 Sep 2024 08:06:39 +0000 (10:06 +0200)]
net: lan966x: use a few FDMA helpers throughout

The library provides helpers for a number of DCB and DB operations. Use
these throughout the code and remove the old ones.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: ditch tx->last_in_use variable
Daniel Machon [Thu, 5 Sep 2024 08:06:38 +0000 (10:06 +0200)]
net: lan966x: ditch tx->last_in_use variable

This variable is used in the tx path to determine the last used DCB. The
library has the variable last_dcb for the exact same purpose. Ditch the
last_in_use variable throughout.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use library helper for freeing tx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:37 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing tx buffers

The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use FDMA library for adding DCB's in the tx path
Daniel Machon [Thu, 5 Sep 2024 08:06:36 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the tx path

Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use the FDMA library for allocation of tx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:35 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of tx buffers

Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.

In order to replace the old buffers with the new ones, we have to do the
following refactoring:

    - use fdma_alloc_phys() and fdma_dcb_init()

    - replace the variables: tx->dma, tx->dcbs and tx->curr_entry
      with the equivalents from the FDMA struct.

    - add lan966x_fdma_tx_dataptr_cb callback for obtaining the dataptr.

    - Initialize FDMA struct values.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use library helper for freeing rx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:34 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing rx buffers

The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use FDMA library for adding DCB's in the rx path
Daniel Machon [Thu, 5 Sep 2024 08:06:33 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the rx path

Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and the functions
for adding DCB's.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use the FDMA library for allocation of rx buffers
Daniel Machon [Thu, 5 Sep 2024 08:06:32 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of rx buffers

Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.

In order to replace the old buffers with the new ones, we have to do the
following refactoring:

    - use fdma_alloc_phys() and fdma_dcb_init()

    - replace the variables: rx->dma, rx->dcbs and rx->last_entry
      with the equivalents from the FDMA struct.

    - make use of fdma->db_size for rx buffer size.

    - add lan966x_fdma_rx_dataptr_cb callback for obtaining the dataptr.

    - Initialize FDMA struct values.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: replace a few variables with new equivalent ones
Daniel Machon [Thu, 5 Sep 2024 08:06:31 +0000 (10:06 +0200)]
net: lan966x: replace a few variables with new equivalent ones

Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: use FDMA library symbols
Daniel Machon [Thu, 5 Sep 2024 08:06:30 +0000 (10:06 +0200)]
net: lan966x: use FDMA library symbols

Include and use the new FDMA header, which now provides the required
masks and bit offsets for operating on the DCB's and DB's.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agonet: lan966x: select FDMA library
Daniel Machon [Thu, 5 Sep 2024 08:06:29 +0000 (10:06 +0200)]
net: lan966x: select FDMA library

Select the newly introduced FDMA library.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 months agoMerge tag 'thermal-v6.12-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git...
Rafael J. Wysocki [Tue, 10 Sep 2024 08:54:15 +0000 (10:54 +0200)]
Merge tag 'thermal-v6.12-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux into

Merge thermal drivers changes for v6.12-rc1 from Daniel Lezcano:

"- Add power domain DT bindings for new Amlogic SoCs (Georges Stark)

 - Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr() in the ST
   driver and add a Kconfig dependency on THERMAL_OF subsystem for the
   STi driver (Raphael Gallais-Pou)

 - Simplify with dev_err_probe() the error code path in the probe
   functions for the brcmstb driver (Yan Zhen)

 - Remove trailing space after \n newline in the Renesas driver (Colin
   Ian King)

 - Add DT binding compatible string for the SA8255p with the tsens
   driver (Nikunj Kela)

 - Use the devm_clk_get_enabled() helpers to simplify the init routine
   in the sprd driver (Huan Yang)

 - Remove __maybe_unused notations for the functions by using the new
   RUNTIME_PM_OPS() and SYSTEM_SLEEP_PM_OPS() macros on the IMx and
   Qoriq drivers (Fabio Estevam)

 - Remove unused declarations in the header file as the functions were
   removed in a previous change on the ti-soc-thermal driver (Zhang
   Zekun)

 - Simplify with dev_err_probe() the error code path in the probe
   functions for the imx_sc_thermal driver (Alexander Stein)"

* tag 'thermal-v6.12-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/imx_sc_thermal: Use dev_err_probe
  thermal/drivers/ti-soc-thermal: Remove unused declarations
  thermal/drivers/imx: Remove __maybe_unused notations
  thermal/drivers/qoriq: Remove __maybe_unused notations
  thermal/drivers/sprd: Use devm_clk_get_enabled() helpers
  dt-bindings: thermal: tsens: document support on SA8255p
  thermal/drivers/renesas: Remove trailing space after \n newline
  thermal/drivers/brcmstb_thermal: Simplify with dev_err_probe()
  thermal/drivers/sti: Depend on THERMAL_OF subsystem
  thermal/drivers/st: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr()
  dt-bindings: thermal: amlogic,thermal: add optional power-domains

12 months agoALSA: hda/realtek: Refactor and simplify Samsung Galaxy Book init
Joshua Grisham [Mon, 9 Sep 2024 19:30:00 +0000 (21:30 +0200)]
ALSA: hda/realtek: Refactor and simplify Samsung Galaxy Book init

I have done a lot of analysis for these type of devices and collaborated
quite a bit with Nick Weihs (author of the first patch submitted for this
including adding samsung_helper.c). More information can be found in the
issue on Github [1] including additional rationale and testing.

The existing implementation includes a large number of equalizer coef
values that are not necessary to actually init and enable the speaker
amps, as well as create a somewhat worse sound profile. Users have
reported "muffled" or "muddy" sound; more information about this including
my analysis of the differences can be found in the linked Github issue.

This patch refactors the "v2" version of ALC298_FIXUP_SAMSUNG_AMP to a much
simpler implementation which removes the new samsung_helper.c, reuses more
of the existing patch_realtek.c, and sends significantly fewer unnecessary
coef values (including removing all of these EQ-specific coef values).

A pcm_playback_hook is used to dynamically enable and disable the speaker
amps only when there will be audio playback; this is to match the behavior
of how the driver for these devices is working in Windows, and is
suspected but not yet tested or confirmed to help with power consumption.

Support for models with 2 speaker amps vs 4 speaker amps is controlled by
a specific quirk name for both types. A new int num_speaker_amps has been
added to alc_spec so that the hooks can know how many speaker amps to
enable or disable. This design was chosen to limit the number of places
that subsystem ids will need to be maintained: like this, they can be
maintained only once in the quirk table and there will not be another
separate list of subsystem ids to maintain elsewhere in the code.

Also updated the quirk name from ALC298_FIXUP_SAMSUNG_AMP2 to
ALC298_FIXUP_SAMSUNG_AMP_V2_.. as this is not a quirk for "Amp #2" on
ALC298 but is instead a different version of how to handle it.

More devices have been added (see Github issue for testing confirmation),
as well as a small cleanup to existing names.

[1]: https://github.com/thesofproject/linux/issues/4055#issuecomment-2323411911

Signed-off-by: Joshua Grisham <josh@joshuagrisham.com>
Link: https://patch.msgid.link/20240909193000.838815-1-josh@joshuagrisham.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 months agoALSA: hda/realtek: Enable mic on Vaio VJFH52
Edson Juliano Drosdeck [Mon, 9 Sep 2024 16:27:51 +0000 (13:27 -0300)]
ALSA: hda/realtek: Enable mic on Vaio VJFH52

Vaio VJFH52 is equipped with ACL256, and needs a
fix to make the internal mic and headphone mic to work.
Also must to limits the internal microphone boost.

Signed-off-by: Edson Juliano Drosdeck <edson.drosdeck@gmail.com>
Link: https://patch.msgid.link/20240909162751.4790-1-edson.drosdeck@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 months agoMerge branch 'for-linus' into for-next
Takashi Iwai [Tue, 10 Sep 2024 08:27:43 +0000 (10:27 +0200)]
Merge branch 'for-linus' into for-next

Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 months agoxen: move max_pfn in xen_memory_setup() out of function scope
Juergen Gross [Tue, 6 Aug 2024 08:24:41 +0000 (10:24 +0200)]
xen: move max_pfn in xen_memory_setup() out of function scope

Instead of having max_pfn as a local variable of xen_memory_setup(),
make it a static variable in setup.c instead. This avoids having to
pass it to subfunctions, which will be needed in more cases in future.

Rename it to ini_nr_pages, as the value denotes the currently usable
number of memory pages as passed from the hypervisor at boot time.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
12 months agoxen: move checks for e820 conflicts further up
Juergen Gross [Tue, 6 Aug 2024 07:56:48 +0000 (09:56 +0200)]
xen: move checks for e820 conflicts further up

Move the checks for e820 memory map conflicts using the
xen_chk_is_e820_usable() helper further up in order to prepare
resolving some of the possible conflicts by doing some e820 map
modifications, which must happen before evaluating the RAM layout.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
12 months agoxen: introduce generic helper checking for memory map conflicts
Juergen Gross [Fri, 2 Aug 2024 12:11:06 +0000 (14:11 +0200)]
xen: introduce generic helper checking for memory map conflicts

When booting as a Xen PV dom0 the memory layout of the dom0 is
modified to match that of the host, as this requires less changes in
the kernel for supporting Xen.

There are some cases, though, which are problematic, as it is the Xen
hypervisor selecting the kernel's load address plus some other data,
which might conflict with the host's memory map.

These conflicts are detected at boot time and result in a boot error.
In order to support handling at least some of these conflicts in
future, introduce a generic helper function which will later gain the
ability to adapt the memory layout when possible.

Add the missing check for the xen_start_info area.

Note that possible p2m map and initrd memory conflicts are handled
already by copying the data to memory areas not conflicting with the
memory map. The initial stack allocated by Xen doesn't need to be
checked, as early boot code is switching to the statically allocated
initial kernel stack. Initial page tables and the kernel itself will
be handled later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
12 months agoxen: use correct end address of kernel for conflict checking
Juergen Gross [Sat, 3 Aug 2024 06:01:22 +0000 (08:01 +0200)]
xen: use correct end address of kernel for conflict checking

When running as a Xen PV dom0 the kernel is loaded by the hypervisor
using a different memory map than that of the host. In order to
minimize the required changes in the kernel, the kernel adapts its
memory map to that of the host. In order to do that it is checking
for conflicts of its load address with the host memory map.

Unfortunately the tested memory range does not include the .brk
area, which might result in crashes or memory corruption when this
area does conflict with the memory map of the host.

Fix the test by using the _end label instead of __bss_stop.

Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout")
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
12 months agosched/debug: Fix the runnable tasks output
Huang Shijie [Fri, 6 Sep 2024 05:30:19 +0000 (13:30 +0800)]
sched/debug: Fix the runnable tasks output

The current runnable tasks output looks like:

  runnable tasks:
   S            task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
  -------------------------------------------------------------------------------------------------------------
   Ikworker/R-rcu_g     4         0.129049 E         0.620179           0.750000         0.002920         2   100         0.000000         0.002920         0.000000         0.000000 0 0 /
   Ikworker/R-sync_     5         0.125328 E         0.624147           0.750000         0.001840         2   100         0.000000         0.001840         0.000000         0.000000 0 0 /
   Ikworker/R-slub_     6         0.120835 E         0.628680           0.750000         0.001800         2   100         0.000000         0.001800         0.000000         0.000000 0 0 /
   Ikworker/R-netns     7         0.114294 E         0.634701           0.750000         0.002400         2   100         0.000000         0.002400         0.000000         0.000000 0 0 /
   I    kworker/0:1     9       508.781746 E       511.754666           3.000000       151.575240       224   120         0.000000       151.575240         0.000000         0.000000 0 0 /

Which is messy. Remove the duplicate printing of sum_exec_runtime and
tidy up the layout to make it look like:

  runnable tasks:
   S            task   PID       vruntime   eligible    deadline             slice          sum-exec      switches  prio         wait-time        sum-sleep       sum-block  node   group-id  group-path
  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   I     kworker/0:3  1698       295.001459   E         297.977619           3.000000        38.862920         9     120         0.000000         0.000000         0.000000   0      0        /
   I     kworker/0:4  1702       278.026303   E         281.026303           3.000000         9.918760         3     120         0.000000         0.000000         0.000000   0      0        /
   S  NetworkManager  2646         0.377936   E           2.598104           3.000000        98.535880       314     120         0.000000         0.000000         0.000000   0      0        /system.slice/NetworkManager.service
   S       virtqemud  2689         0.541016   E           2.440104           3.000000        50.967960        80     120         0.000000         0.000000         0.000000   0      0        /system.slice/virtqemud.service
   S   gsd-smartcard  3058        73.604144   E          76.475904           3.000000        74.033320        88     120         0.000000         0.000000         0.000000   0      0        /user.slice/user-42.slice/session-c1.scope

Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com>
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240906053019.7874-1-shijie@os.amperecomputing.com
12 months agosched: Fix sched_delayed vs sched_core
Peter Zijlstra [Thu, 5 Sep 2024 15:02:24 +0000 (17:02 +0200)]
sched: Fix sched_delayed vs sched_core

Completely analogous to commit dfa0a574cbc4 ("sched/uclamg: Handle
delayed dequeue"), avoid double dequeue for the sched_core entries.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
12 months agokernel/sched: Fix util_est accounting for DELAY_DEQUEUE
Dietmar Eggemann [Wed, 4 Sep 2024 22:05:23 +0000 (00:05 +0200)]
kernel/sched: Fix util_est accounting for DELAY_DEQUEUE

Remove delayed tasks from util_est even they are runnable.

Exclude delayed task which are (a) migrating between rq's or (b) in a
SAVE/RESTORE dequeue/enqueue.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/c49ef5fe-a909-43f1-b02f-a765ab9cedbf@arm.com
12 months agokthread: Fix task state in kthread worker if being frozen
Chen Yu [Tue, 27 Aug 2024 11:23:08 +0000 (19:23 +0800)]
kthread: Fix task state in kthread worker if being frozen

When analyzing a kernel waring message, Peter pointed out that there is a race
condition when the kworker is being frozen and falls into try_to_freeze() with
TASK_INTERRUPTIBLE, which could trigger a might_sleep() warning in try_to_freeze().
Although the root cause is not related to freeze()[1], it is still worthy to fix
this issue ahead.

One possible race scenario:

        CPU 0                                           CPU 1
        -----                                           -----

        // kthread_worker_fn
        set_current_state(TASK_INTERRUPTIBLE);
                                                       suspend_freeze_processes()
                                                         freeze_processes
                                                           static_branch_inc(&freezer_active);
                                                         freeze_kernel_threads
                                                           pm_nosig_freezing = true;
        if (work) { //false
          __set_current_state(TASK_RUNNING);

        } else if (!freezing(current)) //false, been frozen

                      freezing():
                      if (static_branch_unlikely(&freezer_active))
                        if (pm_nosig_freezing)
                          return true;
          schedule()
}

        // state is still TASK_INTERRUPTIBLE
        try_to_freeze()
          might_sleep() <--- warning

Fix this by explicitly set the TASK_RUNNING before entering
try_to_freeze().

Fixes: b56c0d8937e6 ("kthread: implement kthread_worker")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Zs2ZoAcUsZMX2B%2FI@chenyu5-mobl2/
12 months agosched/pelt: Use rq_clock_task() for hw_pressure
Chen Yu [Tue, 27 Aug 2024 11:26:07 +0000 (19:26 +0800)]
sched/pelt: Use rq_clock_task() for hw_pressure

commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
removed the decay_shift for hw_pressure. This commit uses the
sched_clock_task() in sched_tick() while it replaces the
sched_clock_task() with rq_clock_pelt() in __update_blocked_others().
This could bring inconsistence. One possible scenario I can think of
is in ___update_load_sum():

  u64 delta = now - sa->last_update_time

'now' could be calculated by rq_clock_pelt() from
__update_blocked_others(), and last_update_time was calculated by
rq_clock_task() previously from sched_tick(). Usually the former
chases after the latter, it cause a very large 'delta' and brings
unexpected behavior.

Fixes: 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20240827112607.181206-1-yu.c.chen@intel.com
12 months agosched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c
Vincent Guittot [Wed, 4 Sep 2024 09:24:17 +0000 (11:24 +0200)]
sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c

Move effective_cpu_util() and sched_cpu_util() functions in fair.c file
with others utilization related functions.

No functional change.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240904092417.20660-1-vincent.guittot@linaro.org
12 months agosched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
Peter Zijlstra [Fri, 9 Aug 2024 09:22:40 +0000 (09:22 +0000)]
sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()

Since commit b2a02fc43a1f ("smp: Optimize
send_call_function_single_ipi()") an idle CPU in TIF_POLLING_NRFLAG mode
can be pulled out of idle by setting TIF_NEED_RESCHED flag to service an
IPI without actually sending an interrupt. Even in cases where the IPI
handler does not queue a task on the idle CPU, do_idle() will call
__schedule() since need_resched() returns true in these cases.

Introduce and use SM_IDLE to identify call to __schedule() from
schedule_idle() and shorten the idle re-entry time by skipping
pick_next_task() when nr_running is 0 and the previous task is the idle
task.

With the SM_IDLE fast-path, the time taken to complete a fixed set of
IPIs using ipistorm improves noticeably. Following are the numbers
from a dual socket Intel Ice Lake Xeon server (2 x 32C/64T) and
3rd Generation AMD EPYC system (2 x 64C/128T) (boost on, C2 disabled)
running ipistorm between CPU8 and CPU16:

cmdline: insmod ipistorm.ko numipi=100000 single=1 offset=8 cpulist=8 wait=1

   ==================================================================
   Test          : ipistorm (modified)
   Units         : Normalized runtime
   Interpretation: Lower is better
   Statistic     : AMean
   ======================= Intel Ice Lake Xeon ======================
   kernel: time [pct imp]
   tip:sched/core 1.00 [baseline]
   tip:sched/core + SM_IDLE 0.80 [20.51%]
   ==================== 3rd Generation AMD EPYC =====================
   kernel: time [pct imp]
   tip:sched/core 1.00 [baseline]
   tip:sched/core + SM_IDLE 0.90 [10.17%]
   ==================================================================

[ kprateek: Commit message, SM_RTLOCK_WAIT fix ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240809092240.6921-1-kprateek.nayak@amd.com
12 months agoerofs: simplify erofs_map_blocks_flatmode()
Hongzhen Luo [Thu, 5 Sep 2024 03:03:39 +0000 (11:03 +0800)]
erofs: simplify erofs_map_blocks_flatmode()

Get rid of redundant variables (nblocks, offset) and a dead branch
(!tailendpacking).

Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240905030339.1474396-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
12 months agoerofs: refactor read_inode calling convention
Yiyang Wu [Mon, 2 Sep 2024 09:34:12 +0000 (17:34 +0800)]
erofs: refactor read_inode calling convention

Refactor out the iop binding behavior out of the erofs_fill_symlink
and move erofs_buf into the erofs_read_inode, so that erofs_fill_inode
can only deal with inode operation bindings and can be decoupled from
metabuf operations. This results in better calling conventions.

Note that after this patch, we do not need erofs_buf and ofs as
parameters any more when calling erofs_read_inode as
all the data operations are now included in itself.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV/
Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240902093412.509083-1-toolmanp@tlmp.cc
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
12 months agoerofs: use kmemdup_nul in erofs_fill_symlink
Yiyang Wu [Mon, 2 Sep 2024 08:31:46 +0000 (16:31 +0800)]
erofs: use kmemdup_nul in erofs_fill_symlink

Remove open coding in erofs_fill_symlink.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV
Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
Link: https://lore.kernel.org/r/20240902083147.450558-2-toolmanp@tlmp.cc
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
12 months agoerofs: mark experimental fscache backend deprecated
Gao Xiang [Fri, 30 Aug 2024 03:28:40 +0000 (11:28 +0800)]
erofs: mark experimental fscache backend deprecated

Although fscache is still described as "General Filesystem Caching" for
network filesystems and other things such as ISO9660 filesystems, it has
actually become a part of netfslib recently, which was unexpected at the
time when "EROFS over fscache" proposed (2021) since EROFS is entirely a
disk filesystem and the dependency is redundant.

Mark it deprecated and it will be removed after "fanotify pre-content
hooks" lands, which will provide the same functionality for EROFS.

Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240830032840.3783206-4-hsiangkao@linux.alibaba.com
12 months agoerofs: support compressed inodes for fileio
Gao Xiang [Fri, 30 Aug 2024 03:28:39 +0000 (11:28 +0800)]
erofs: support compressed inodes for fileio

Use pseudo bios just like the previous fscache approach since
merged bio_vecs can be filled properly with unique interfaces.

Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240830032840.3783206-3-hsiangkao@linux.alibaba.com
12 months agoerofs: support unencoded inodes for fileio
Gao Xiang [Thu, 5 Sep 2024 09:30:31 +0000 (17:30 +0800)]
erofs: support unencoded inodes for fileio

Since EROFS only needs to handle read requests in simple contexts,
Just directly use vfs_iocb_iter_read() for data I/Os.

Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240905093031.2745929-1-hsiangkao@linux.alibaba.com
12 months agoerofs: add file-backed mount support
Gao Xiang [Fri, 30 Aug 2024 03:28:37 +0000 (11:28 +0800)]
erofs: add file-backed mount support

It actually has been around for years: For containers and other sandbox
use cases, there will be thousands (and even more) of authenticated
(sub)images running on the same host, unlike OS images.

Of course, all scenarios can use the same EROFS on-disk format, but
bdev-backed mounts just work well for OS images since golden data is
dumped into real block devices.  However, it's somewhat hard for
container runtimes to manage and isolate so many unnecessary virtual
block devices safely and efficiently [1]: they just look like a burden
to orchestrators and file-backed mounts are preferred indeed.  There
were already enough attempts such as Incremental FS, the original
ComposeFS and PuzzleFS acting in the same way for immutable fses.  As
for current EROFS users, ComposeFS, containerd and Android APEXs will
be directly benefited from it.

On the other hand, previous experimental feature "erofs over fscache"
was once also intended to provide a similar solution (inspired by
Incremental FS discussion [2]), but the following facts show file-backed
mounts will be a better approach:
 - Fscache infrastructure has recently been moved into new Netfslib
   which is an unexpected dependency to EROFS really, although it
   originally claims "it could be used for caching other things such as
   ISO9660 filesystems too." [3]

 - It takes an unexpectedly long time to upstream Fscache/Cachefiles
   enhancements.  For example, the failover feature took more than
   one year, and the deamonless feature is still far behind now;

 - Ongoing HSM "fanotify pre-content hooks" [4] together with this will
   perfectly supersede "erofs over fscache" in a simpler way since
   developers (mainly containerd folks) could leverage their existing
   caching mechanism entirely in userspace instead of strictly following
   the predefined in-kernel caching tree hierarchy.

After "fanotify pre-content hooks" lands upstream to provide the same
functionality, "erofs over fscache" will be removed then (as an EROFS
internal improvement and EROFS will not have to bother with on-demand
fetching and/or caching improvements anymore.)

[1] https://github.com/containers/storage/pull/2039
[2] https://lore.kernel.org/r/CAOQ4uxjbVxnubaPjVaGYiSwoGDTdpWbB=w_AeM6YM=zVixsUfQ@mail.gmail.com
[3] https://docs.kernel.org/filesystems/caching/fscache.html
[4] https://lore.kernel.org/r/cover.1723670362.git.josef@toxicpanda.com

Closes: https://github.com/containers/composefs/issues/144
Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240830032840.3783206-1-hsiangkao@linux.alibaba.com
12 months agoerofs: handle overlapped pclusters out of crafted images properly
Gao Xiang [Tue, 10 Sep 2024 07:08:47 +0000 (15:08 +0800)]
erofs: handle overlapped pclusters out of crafted images properly

syzbot reported a task hang issue due to a deadlock case where it is
waiting for the folio lock of a cached folio that will be used for
cache I/Os.

After looking into the crafted fuzzed image, I found it's formed with
several overlapped big pclusters as below:

 Ext:   logical offset   |  length :     physical offset    |  length
   0:        0..   16384 |   16384 :     151552..    167936 |   16384
   1:    16384..   32768 |   16384 :     155648..    172032 |   16384
   2:    32768..   49152 |   16384 :  537223168.. 537239552 |   16384
...

Here, extent 0/1 are physically overlapped although it's entirely
_impossible_ for normal filesystem images generated by mkfs.

First, managed folios containing compressed data will be marked as
up-to-date and then unlocked immediately (unlike in-place folios) when
compressed I/Os are complete.  If physical blocks are not submitted in
the incremental order, there should be separate BIOs to avoid dependency
issues.  However, the current code mis-arranges z_erofs_fill_bio_vec()
and BIO submission which causes unexpected BIO waits.

Second, managed folios will be connected to their own pclusters for
efficient inter-queries.  However, this is somewhat hard to implement
easily if overlapped big pclusters exist.  Again, these only appear in
fuzzed images so let's simply fall back to temporary short-lived pages
for correctness.

Additionally, it justifies that referenced managed folios cannot be
truncated for now and reverts part of commit 2080ca1ed3e4 ("erofs: tidy
up `struct z_erofs_bvec`") for simplicity although it shouldn't be any
difference.

Reported-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com
Reported-by: syzbot+de04e06b28cfecf2281c@syzkaller.appspotmail.com
Reported-by: syzbot+c8c8238b394be4a1087d@syzkaller.appspotmail.com
Tested-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000002fda01061e334873@google.com
Fixes: 8e6c8fa9f2e9 ("erofs: enable big pcluster feature")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240910070847.3356592-1-hsiangkao@linux.alibaba.com
12 months agodrm/i915/guc: prevent a possible int overflow in wq offsets
Nikita Zhandarovich [Thu, 25 Jul 2024 15:59:25 +0000 (08:59 -0700)]
drm/i915/guc: prevent a possible int overflow in wq offsets

It may be possible for the sum of the values derived from
i915_ggtt_offset() and __get_parent_scratch_offset()/
i915_ggtt_offset() to go over the u32 limit before being assigned
to wq offsets of u64 type.

Mitigate these issues by expanding one of the right operands
to u64 to avoid any overflow issues just in case.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Link: https://patchwork.freedesktop.org/patch/msgid/20240725155925.14707-1-n.zhandarovich@fintech.ru
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 1f1c1bd56620b80ae407c5790743e17caad69cec)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
12 months agodma-mapping: add tracing for dma-mapping API calls
Sean Anderson [Fri, 6 Sep 2024 21:54:34 +0000 (17:54 -0400)]
dma-mapping: add tracing for dma-mapping API calls

When debugging drivers, it can often be useful to trace when memory gets
(un)mapped for DMA (and can be accessed by the device). Add some
tracepoints for this purpose.

Use u64 instead of phys_addr_t and dma_addr_t (and similarly %llx instead
of %pa) because libtraceevent can't handle typedefs in all cases.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Christoph Hellwig <hch@lst.de>
12 months agoMerge tag 'drm-xe-next-2024-09-05' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Tue, 10 Sep 2024 03:17:56 +0000 (13:17 +1000)]
Merge tag 'drm-xe-next-2024-09-05' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Cross-subsystem Changes:
- Split dma fence array creation into alloc and arm (Matthew Brost)

Driver Changes:
- Move kernel_lrc to execlist backend (Ilia)
- Fix type width for pcode coommand (Karthik)
- Make xe_drm.h include unambiguous (Jani)
- Fixes and debug improvements for GSC load (Daniele)
- Track resources and VF state by PF (Michal Wajdeczko)
- Fix memory leak on error path (Nirmoy)
- Cleanup header includes (Matt Roper)
- Move pcode logic to tile scope (Matt Roper)
- Move hwmon logic to device scope (Matt Roper)
- Fix media TLB invalidation (Matthew Brost)
- Threshold config fixes for PF (Michal Wajdeczko)
- Remove extra "[drm]" from logs (Michal Wajdeczko)
- Add missing runtime ref (Rodrigo Vivi)
- Fix circular locking on runtime suspend (Rodrigo Vivi)
- Fix rpm in TTM swapout path (Thomas)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/eirx5vdvoflbbqlrzi5cip6bpu3zjojm2pxseufu3rlq4pp6xv@eytjvhizfyu6
12 months agoMerge branch 'ionic-convert-rx-queue-buffers-to-use-page_pool'
Jakub Kicinski [Tue, 10 Sep 2024 02:20:42 +0000 (19:20 -0700)]
Merge branch 'ionic-convert-rx-queue-buffers-to-use-page_pool'

Brett Creeley says:

====================
ionic: convert Rx queue buffers to use page_pool

Our home-grown buffer management needs to go away and we need to play
nicely with the page_pool infrastructure.  This patchset cleans up some
of our API use and converts the Rx traffic queues to use page_pool.
The first few patches are for tidying up things, then a small XDP
configuration refactor, adding page_pool support, and finally adding
support to hot swap an XDP program without having to reconfigure
anything.

The result is code that more closely follows current patterns, as well as
a either a performance boost or equivalent performance as seen with
iperf testing:

  mss   netio    tx_pps      rx_pps      total_pps      tx_bw    rx_bw    total_bw
  ----  -------  ----------  ----------  -----------  -------  -------  ----------
Before:
  256   bidir    13,839,293  15,515,227  29,354,520        34       38          71
  512   bidir    13,913,249  14,671,693  28,584,942        62       65         127
  1024  bidir    13,006,189  13,695,413  26,701,602       109      115         224
  1448  bidir    12,489,905  12,791,734  25,281,639       145      149         294
  2048  bidir    9,195,622   9,247,649   18,443,271       148      149         297
  4096  bidir    5,149,716   5,247,917   10,397,633       160      163         323
  8192  bidir    3,029,993   3,008,882   6,038,875        179      179         358
  9000  bidir    2,789,358   2,800,744   5,590,102        181      180         361

After:
  256   bidir    21,540,037  21,344,644  42,884,681        52       52         104
  512   bidir    23,170,014  19,207,260  42,377,274       103       85         188
  1024  bidir    17,934,280  17,819,247  35,753,527       150      149         299
  1448  bidir    15,242,515  14,907,030  30,149,545       167      174         341
  2048  bidir    10,692,542  10,663,023  21,355,565       177      176         353
  4096  bidir    6,024,977   6,083,580   12,108,557       187      180         367
  8192  bidir    3,090,449   3,048,266   6,138,715        180      176         356
  9000  bidir    2,859,146   2,864,226   5,723,372        178      180         358

v2: https://lore.kernel.org/20240826184422.21895-1-brett.creeley@amd.com
v1: https://lore.kernel.org/20240625165658.34598-1-shannon.nelson@amd.com
====================

Link: https://patch.msgid.link/20240906232623.39651-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoionic: Allow XDP program to be hot swapped
Brett Creeley [Fri, 6 Sep 2024 23:26:23 +0000 (16:26 -0700)]
ionic: Allow XDP program to be hot swapped

Using examples of other driver(s), add the ability to hot-swap an XDP
program without having to reconfigure the queues. To prevent the
q->xdp_prog to be read/written more than once use READ_ONCE() and
WRITE_ONCE() on the q->xdp_prog.

The q->xdp_prog was being checked in multiple different for loops in the
hot path. The change to allow xdp_prog hot swapping created the
possibility for many READ_ONCE(q->xdp_prog) calls during a single napi
callback. Refactor the Rx napi handling to allow a previous
READ_ONCE(q->xdp_prog) (or NULL for hwstamp_rxq) to be passed into the
relevant functions.

Also, move other Rx related hotpath handling into the newly created
ionic_rx_cq_service() function to reduce the scope of the xdp_prog
local variable and put all Rx handling in one function similar to Tx.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-8-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoionic: convert Rx queue buffers to use page_pool
Shannon Nelson [Fri, 6 Sep 2024 23:26:22 +0000 (16:26 -0700)]
ionic: convert Rx queue buffers to use page_pool

Our home-grown buffer management needs to go away and we need
to be playing nicely with the page_pool infrastructure.  This
converts the Rx traffic queues to use page_pool.

Also, since ionic_rx_buf_size() was removed, redefine
IONIC_PAGE_SIZE to account for IONIC_MAX_BUF_LEN being the
largest allowed buffer to prevent overflowing u16 variables,
which could happen when PAGE_SIZE is defined as >= 64KB.

include/linux/minmax.h:93:37: warning: conversion from 'long unsigned int' to 'u16' {aka 'short unsigned int'} changes value from '65536' to '0' [-Woverflow]

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoionic: Fully reconfigure queues when going to/from a NULL XDP program
Brett Creeley [Fri, 6 Sep 2024 23:26:21 +0000 (16:26 -0700)]
ionic: Fully reconfigure queues when going to/from a NULL XDP program

Currently when going to/from a NULL XDP program the driver uses
ionic_stop_queues_reconfig() and then ionic_start_queues_reconfig() in
order to re-register the xdp_rxq_info and re-init the queues. This is
fine until page_pool(s) are used in an upcoming patch.

In preparation for adding page_pool support make sure to completely
rebuild the queues when going to/from a NULL XDP program. Without this
change the call to mem_allocator_disconnect() never happens when going
to a NULL XDP program, which eventually results in
xdp_rxq_info_reg_mem_model() failing with -ENOSPC due to the mem_id_pool
ida having no remaining space.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-6-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoionic: always use rxq_info
Shannon Nelson [Fri, 6 Sep 2024 23:26:20 +0000 (16:26 -0700)]
ionic: always use rxq_info

Instead of setting up and tearing down the rxq_info only when the XDP
program is loaded or unloaded, we will build the rxq_info whether or not
XDP is in use.  This is the more common use pattern and better supports
future conversion to page_pool.  Since the rxq_info wants the napi_id
we re-order things slightly to tie this into the queue init and deinit
functions where we do the add and delete of napi.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoionic: use per-queue xdp_prog
Shannon Nelson [Fri, 6 Sep 2024 23:26:19 +0000 (16:26 -0700)]
ionic: use per-queue xdp_prog

We originally were using a per-interface xdp_prog variable to track
a loaded XDP program since we knew there would never be support for a
per-queue XDP program.  With that, we only built the per queue rxq_info
struct when an XDP program was loaded and removed it on XDP program unload,
and used the pointer as an indicator in the Rx hotpath to know to how build
the buffers.  However, that's really not the model generally used, and
makes a conversion to page_pool Rx buffer cacheing a little problematic.

This patch converts the driver to use the more common approach of using
a per-queue xdp_prog pointer to work out buffer allocations and need
for bpf_prog_run_xdp().  We jostle a couple of fields in the queue struct
in order to keep the new xdp_prog pointer in a warm cacheline.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoionic: rename ionic_xdp_rx_put_bufs
Shannon Nelson [Fri, 6 Sep 2024 23:26:18 +0000 (16:26 -0700)]
ionic: rename ionic_xdp_rx_put_bufs

We aren't "putting" buf, we're just unlinking them from our tracking in
order to let the XDP_TX and XDP_REDIRECT tx clean paths take care of the
pages when they are done with them.  This rename clears up the intent.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoionic: debug line for Tx completion errors
Shannon Nelson [Fri, 6 Sep 2024 23:26:17 +0000 (16:26 -0700)]
ionic: debug line for Tx completion errors

Here's a little debugging aid in case the device starts throwing
Tx completion errors.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agopowerpc: Switch back to struct platform_driver::remove()
Uwe Kleine-König [Mon, 9 Sep 2024 13:09:02 +0000 (15:09 +0200)]
powerpc: Switch back to struct platform_driver::remove()

After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.

Convert all pwm drivers to use .remove(), with the eventual goal to drop
struct platform_driver::remove_new(). As .remove() and .remove_new() have
the same prototypes, conversion is done by just changing the structure
member name in the driver initializer.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240909130902.851274-2-u.kleine-koenig@baylibre.com
12 months agopowerpc/pseries/eeh: Fix pseries_eeh_err_inject
Narayana Murty N [Mon, 9 Sep 2024 14:02:20 +0000 (09:02 -0500)]
powerpc/pseries/eeh: Fix pseries_eeh_err_inject

VFIO_EEH_PE_INJECT_ERR ioctl is currently failing on pseries
due to missing implementation of err_inject eeh_ops for pseries.
This patch implements pseries_eeh_err_inject in eeh_ops/pseries
eeh_ops. Implements support for injecting MMIO load/store error
for testing from user space.

The check on PCI error type (bus type) code is moved to platform
code, since the eeh_pe_inject_err can be allowed to more error
types depending on platform requirement. Removal of the check for
'type' in eeh_pe_inject_err() doesn't impact PowerNV as
pnv_eeh_err_inject() already has an equivalent check in place.

Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240909140220.529333-1-nnmlinux@linux.ibm.com
12 months agoMerge branch 'rx-software-timestamp-for-all-round-3'
Jakub Kicinski [Tue, 10 Sep 2024 00:44:52 +0000 (17:44 -0700)]
Merge branch 'rx-software-timestamp-for-all-round-3'

Gal Pressman says:

====================
RX software timestamp for all - round 3

Rounds 1 & 2 of drivers conversion were merged [1][2], this round will
complete the work.

[1] https://lore.kernel.org/netdev/20240901112803.212753-1-gal@nvidia.com/
[2] https://lore.kernel.org/netdev/20240904074922.256275-1-gal@nvidia.com/
====================

Link: https://patch.msgid.link/20240906144632.404651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoptp: ptp_ines: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:32 +0000 (17:46 +0300)]
ptp: ptp_ines: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-17-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoixp4xx_eth: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:31 +0000 (17:46 +0300)]
ixp4xx_eth: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20240906144632.404651-16-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: stmmac: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:30 +0000 (17:46 +0300)]
net: stmmac: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-15-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agosfc/siena: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:29 +0000 (17:46 +0300)]
sfc/siena: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-14-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agosfc: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:28 +0000 (17:46 +0300)]
sfc: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-13-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoqede: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:27 +0000 (17:46 +0300)]
qede: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-12-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: mscc: ocelot: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:26 +0000 (17:46 +0300)]
net: mscc: ocelot: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-11-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet/funeth: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:25 +0000 (17:46 +0300)]
net/funeth: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-10-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoenic: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:24 +0000 (17:46 +0300)]
enic: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-9-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: thunderx: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:23 +0000 (17:46 +0300)]
net: thunderx: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-8-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoliquidio: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:22 +0000 (17:46 +0300)]
liquidio: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-7-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: macb: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:21 +0000 (17:46 +0300)]
net: macb: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://patch.msgid.link/20240906144632.404651-6-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoamd-xgbe: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:20 +0000 (17:46 +0300)]
amd-xgbe: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://patch.msgid.link/20240906144632.404651-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agobonding: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:19 +0000 (17:46 +0300)]
bonding: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agotg3: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:18 +0000 (17:46 +0300)]
tg3: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agobnxt_en: Remove setting of RX software timestamp
Gal Pressman [Fri, 6 Sep 2024 14:46:17 +0000 (17:46 +0300)]
bnxt_en: Remove setting of RX software timestamp

The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.

Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ti: icssg-prueth: Make pa_stats optional
MD Danish Anwar [Fri, 6 Sep 2024 09:36:49 +0000 (15:06 +0530)]
net: ti: icssg-prueth: Make pa_stats optional

pa_stats is optional in dt bindings, make it optional in driver as well.
Currently if pa_stats syscon regmap is not found driver returns -ENODEV.
Fix this by not returning an error in case pa_stats is not found and
continue generating ethtool stats without pa_stats.

Fixes: 550ee90ac61c ("net: ti: icssg-prueth: Add support for PA Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240906093649.870883-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ibm: emac: Use __iomem annotation for emac_[xg]aht_base
Simon Horman [Fri, 6 Sep 2024 07:36:09 +0000 (08:36 +0100)]
net: ibm: emac: Use __iomem annotation for emac_[xg]aht_base

dev->emacp contains an __iomem pointer and values derived
from it are used as __iomem pointers. So use this annotation
in the return type for helpers that derive pointers from dev->emacp.

Flagged by Sparse as:

.../core.c:444:36: warning: incorrect type in argument 1 (different address spaces)
.../core.c:444:36:    expected unsigned int volatile [noderef] [usertype] __iomem *addr
.../core.c:444:36:    got unsigned int [usertype] *
.../core.c: note: in included file:
.../core.h:416:25: warning: cast removes address space '__iomem' of expression

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240906-emac-iomem-v1-1-207cc4f3fed0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMerge branch 'selftests-net-add-packetdrill'
Jakub Kicinski [Tue, 10 Sep 2024 00:38:03 +0000 (17:38 -0700)]
Merge branch 'selftests-net-add-packetdrill'

Willem de Bruijn says:

====================
selftests/net: add packetdrill

Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.

1/2: add kselftest infra for TEST_PROGS that need an interpreter

2/2: add the specific packetdrill tests
====================

Link: https://patch.msgid.link/20240905231653.2427327-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoselftests/net: integrate packetdrill with ksft
Willem de Bruijn [Thu, 5 Sep 2024 23:15:52 +0000 (19:15 -0400)]
selftests/net: integrate packetdrill with ksft

Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.

Florian recently added support for packetdrill tests in nf_conntrack,
in commit a8a388c2aae49 ("selftests: netfilter: add packetdrill based
conntrack tests").

This patch takes a slightly different approach. It relies on
ksft_runner.sh to run every *.pkt file in the directory.

Any future imports of packetdrill tests should require no additional
coding. Just add the *.pkt files.

Initially import only two features/directories from github. One with a
single script, and one with two. This was the only reason to pick
tcp/inq and tcp/md5.

The path replaces the directory hierarchy in github with a flat space
of files: $(subst /,_,$(wildcard tcp/**/*.pkt)). This is the most
straightforward option to integrate with kselftests. The Linked thread
reviewed two ways to maintain the hierarchy: TEST_PROGS_RECURSE and
PRESERVE_TEST_DIRS. But both introduce significant changes to
kselftest infra and with that risk to existing tests.

Implementation notes:
- restore alphabetical order when adding the new directory to
  tools/testing/selftests/Makefile
- imported *.pkt files and support verbatim from the github project,
  except for
    - update `source ./defaults.sh` path (to adjust for flat dir)
    - add SPDX headers
    - remove one author statement
    - Acknowledgment: drop an e (checkpatch)

Tested:
make -C tools/testing/selftests \
  TARGETS=net/packetdrill \
  run_tests

make -C tools/testing/selftests \
  TARGETS=net/packetdrill \
  install INSTALL_PATH=$KSFT_INSTALL_PATH

# in virtme-ng
./run_kselftest.sh -c net/packetdrill
./run_kselftest.sh -t net/packetdrill:tcp_inq_client.pkt

Link: https://lore.kernel.org/netdev/20240827193417.2792223-1-willemdebruijn.kernel@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoselftests: support interpreted scripts with ksft_runner.sh
Willem de Bruijn [Thu, 5 Sep 2024 23:15:51 +0000 (19:15 -0400)]
selftests: support interpreted scripts with ksft_runner.sh

Support testcases that are themselves not executable, but need an
interpreter to run them.

If a test file is not executable, but an executable file
ksft_runner.sh exists in the TARGET dir, kselftest will run

    ./ksft_runner.sh ./$BASENAME_TEST

Packetdrill may add hundreds of packetdrill scripts for testing. These
scripts must be passed to the packetdrill process.

Have kselftest run each test directly, as it already solves common
runner requirements like parallel execution and isolation (netns).
A previous RFC added a wrapper in between, which would have to
reimplement such functionality.

Link: https://lore.kernel.org/netdev/66d4d97a4cac_3df182941a@willemb.c.googlers.com.notmuch/T/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agofou: fix initialization of grc
Muhammad Usama Anjum [Fri, 6 Sep 2024 10:28:39 +0000 (15:28 +0500)]
fou: fix initialization of grc

The grc must be initialize first. There can be a condition where if
fou is NULL, goto out will be executed and grc would be used
uninitialized.

Fixes: 7e4196935069 ("fou: Fix null-ptr-deref in GRO.")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240906102839.202798-1-usama.anjum@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMerge branch 'various-cleanups'
Jakub Kicinski [Tue, 10 Sep 2024 00:17:45 +0000 (17:17 -0700)]
Merge branch 'various-cleanups'

Rosen Penev says:

====================
various cleanups

Allow CI to build. Also a bugfix for dual GMAC devices.
====================

Link: https://patch.msgid.link/20240905194938.8453-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ag71xx: disable napi interrupts during probe
Sven Eckelmann [Thu, 5 Sep 2024 19:49:38 +0000 (12:49 -0700)]
net: ag71xx: disable napi interrupts during probe

ag71xx_probe is registering ag71xx_interrupt as handler for gmac0/gmac1
interrupts. The handler is trying to use napi_schedule to handle the
processing of packets. But the netif_napi_add for this device is
called a lot later in ag71xx_probe.

It can therefore happen that a still running gmac0/gmac1 is triggering the
interrupt handler with a bit from AG71XX_INT_POLL set in
AG71XX_REG_INT_STATUS. The handler will then call napi_schedule and the
napi code will crash the system because the ag->napi is not yet
initialized.

The gmcc0/gmac1 must be brought in a state in which it doesn't signal a
AG71XX_INT_POLL related status bits as interrupt before registering the
interrupt handler. ag71xx_hw_start will take care of re-initializing the
AG71XX_REG_INT_ENABLE.

This will become relevant when dual GMAC devices get added here.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20240905194938.8453-8-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ag71xx: remove always true branch
Rosen Penev [Thu, 5 Sep 2024 19:49:37 +0000 (12:49 -0700)]
net: ag71xx: remove always true branch

The opposite of this condition is checked above and if true, function
returns. Which means this can never be false.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-7-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ag71xx: get reset control using devm api
Rosen Penev [Thu, 5 Sep 2024 19:49:36 +0000 (12:49 -0700)]
net: ag71xx: get reset control using devm api

Currently, the of variant is missing reset_control_put in error paths.
The devm variant does not require it.

Allows removing mdio_reset from the struct as it is not used outside the
function.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-6-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ag71xx: use ethtool_puts
Rosen Penev [Thu, 5 Sep 2024 19:49:35 +0000 (12:49 -0700)]
net: ag71xx: use ethtool_puts

Allows simplifying get_strings and avoids manual pointer manipulation.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-5-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ag71xx: update FIFO bits and descriptions
Rosen Penev [Thu, 5 Sep 2024 19:49:34 +0000 (12:49 -0700)]
net: ag71xx: update FIFO bits and descriptions

Taken from QCA SDK. No functional difference as same bits get applied.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-4-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ag71xx: add MODULE_DESCRIPTION
Rosen Penev [Thu, 5 Sep 2024 19:49:33 +0000 (12:49 -0700)]
net: ag71xx: add MODULE_DESCRIPTION

Now that COMPILE_TEST is enabled, it gets flagged when building with
allmodconfig W=1 builds. Text taken from the beginning of the file.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240905194938.8453-3-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agonet: ag71xx: add COMPILE_TEST to test compilation
Rosen Penev [Thu, 5 Sep 2024 19:49:32 +0000 (12:49 -0700)]
net: ag71xx: add COMPILE_TEST to test compilation

While this driver is meant for MIPS only, it can be compiled on x86 just
fine. Remove pointless parentheses while at it.

Enables CI building of this driver.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240905194938.8453-2-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 months agoMerge branch 'af_unix-correct-manage_oob-when-oob-follows-a-consumed-oob'
Jakub Kicinski [Tue, 10 Sep 2024 00:14:28 +0000 (17:14 -0700)]
Merge branch 'af_unix-correct-manage_oob-when-oob-follows-a-consumed-oob'

Kuniyuki Iwashima says:

====================
af_unix: Correct manage_oob() when OOB follows a consumed OOB.

Recently syzkaller reported UAF of OOB skb.

The bug was introduced by commit 93c99f21db36 ("af_unix: Don't stop
recv(MSG_DONTWAIT) if consumed OOB skb is at the head.") but uncovered
by another recent commit 8594d9b85c07 ("af_unix: Don't call skb_get()
for OOB skb.").

[0]: https://lore.kernel.org/netdev/00000000000083b05a06214c9ddc@google.com/
====================

Link: https://patch.msgid.link/20240905193240.17565-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>