]> www.infradead.org Git - users/jedix/linux-maple.git/log
users/jedix/linux-maple.git
7 months agomd: Add new_level sysfs interface
Xiao Ni [Wed, 4 Sep 2024 23:54:53 +0000 (07:54 +0800)]
md: Add new_level sysfs interface

Now reshape supports two ways: with backup file or without backup file.
For the situation without backup file, it needs to change data offset.
It doesn't need systemd service mdadm-grow-continue. So it can finish
the reshape job in one process environment. It can know the new level
from mdadm --grow command and can change to new level after reshape
finishes.

For the situation with backup file, it needs systemd service
mdadm-grow-continue to monitor reshape progress. So there are two process
envolved. One is mdadm --grow command whick kicks off reshape and wakes
up mdadm-grow-continue service. The second process is the service, which
doesn't know the new level from the first process.

In kernel space mddev->new_level is used to record the new level when
doing reshape. This patch adds a new interface to help mdadm update
new_level and sync it to metadata. Then mdadm-grow-continue can read the
right new_level.

Commit log revised by Song Liu. Please refer to the link for more details.

Signed-off-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/r/20240904235453.99120-1-xni@redhat.com
Signed-off-by: Song Liu <song@kernel.org>
7 months agomd: Report failed arrays as broken in mdstat
Mateusz Kusiak [Tue, 3 Sep 2024 14:29:49 +0000 (16:29 +0200)]
md: Report failed arrays as broken in mdstat

Depending on if array has personality, it is either reported as active or
inactive. This patch adds third status "broken" for arrays with
personality that became inoperative. The reason is end users tend to
assume that "active" indicates array is operational.

Add "broken" state for inoperative arrays with personality and refactor
the code.

Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
Link: https://lore.kernel.org/r/20240903142949.53628-1-mateusz.kusiak@intel.com
Signed-off-by: Song Liu <song@kernel.org>
7 months agoMerge branch 'md-6.12-raid5-opt' into md-6.12
Song Liu [Thu, 29 Aug 2024 18:22:13 +0000 (11:22 -0700)]
Merge branch 'md-6.12-raid5-opt' into md-6.12

From Artur:

The wait_for_overlap wait queue is currently used in two cases, which
are not really related:
 - waiting for actual overlapping bios, which uses R5_Overlap bit,
 - waiting for events related to reshape.

Handling every write request in raid5_make_request() involves adding to
and removing from this wait queue, which uses a spinlock. With fast
storage and multiple submitting threads the contention on this lock is
noticeable.

This patch series aims to resolve this by separating the two cases
mentioned above and using this wait queue only when reshape is in
progress.

The results when testing 4k random writes on raid5 with null_blk
(8 jobs, qd=64, group_thread_cnt=8):
before: 463k IOPS
after:  523k IOPS

The improvement is not huge with this series alone but it is just one of
the bottlenecks. When applied onto some other changes I'm working on, it
allowed to go from 845k IOPS to 975k IOPS on the same test.

* md-6.12-raid5-opt:
  md/raid5: rename wait_for_overlap to wait_for_reshape
  md/raid5: only add to wq if reshape is in progress
  md/raid5: use wait_on_bit() for R5_Overlap

Signed-off-by: Song Liu <song@kernel.org>
7 months agomd/raid5: rename wait_for_overlap to wait_for_reshape
Artur Paszkiewicz [Tue, 27 Aug 2024 15:35:36 +0000 (17:35 +0200)]
md/raid5: rename wait_for_overlap to wait_for_reshape

The only remaining uses of wait_for_overlap are related to reshape so
rename it accordingly.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Link: https://lore.kernel.org/r/20240827153536.6743-4-artur.paszkiewicz@intel.com
Signed-off-by: Song Liu <song@kernel.org>
7 months agomd/raid5: only add to wq if reshape is in progress
Artur Paszkiewicz [Tue, 27 Aug 2024 15:35:35 +0000 (17:35 +0200)]
md/raid5: only add to wq if reshape is in progress

Now that actual overlaps are not handled on the wait_for_overlap wq
anymore, the remaining cases when we wait on this wq are limited to
reshape. If reshape is not in progress, don't add to the wq in
raid5_make_request() because add_wait_queue() / remove_wait_queue()
operations take a spinlock and cause noticeable contention when multiple
threads are submitting requests to the mddev.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Link: https://lore.kernel.org/r/20240827153536.6743-3-artur.paszkiewicz@intel.com
Signed-off-by: Song Liu <song@kernel.org>
7 months agomd/raid5: use wait_on_bit() for R5_Overlap
Artur Paszkiewicz [Tue, 27 Aug 2024 15:35:34 +0000 (17:35 +0200)]
md/raid5: use wait_on_bit() for R5_Overlap

Convert uses of wait_for_overlap wait queue with R5_Overlap bit to
wait_on_bit() / wake_up_bit().

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Link: https://lore.kernel.org/r/20240827153536.6743-2-artur.paszkiewicz@intel.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agoMerge branch 'md-6.12-bitmap' into md-6.12
Song Liu [Wed, 28 Aug 2024 21:55:57 +0000 (14:55 -0700)]
Merge branch 'md-6.12-bitmap' into md-6.12

From Yu Kuai (with minor changes by Song Liu):

The background is that currently bitmap is using a global spin_lock,
causing lock contention and huge IO performance degradation for all raid
levels.

However, it's impossible to implement a new lock free bitmap with
current situation that md-bitmap exposes the internal implementation
with lots of exported apis. Hence bitmap_operations is invented, to
describe bitmap core implementation, and a new bitmap can be introduced
with a new bitmap_operations, we only need to switch to the new one
during initialization.

And with this we can build bitmap as kernel module, but that's not
our concern for now.

This version was tested with mdadm tests and lvm2 tests. This set does
not introduce new errors in these tests.

* md-6.12-bitmap: (42 commits)
  md/md-bitmap: make in memory structure internal
  md/md-bitmap: merge md_bitmap_enabled() into bitmap_operations
  md/md-bitmap: merge md_bitmap_wait_behind_writes() into bitmap_operations
  md/md-bitmap: merge md_bitmap_free() into bitmap_operations
  md/md-bitmap: merge md_bitmap_set_pages() into struct bitmap_operations
  md/md-bitmap: merge md_bitmap_copy_from_slot() into struct bitmap_operation.
  md/md-bitmap: merge get_bitmap_from_slot() into bitmap_operations
  md/md-bitmap: merge md_bitmap_resize() into bitmap_operations
  md/md-bitmap: pass in mddev directly for md_bitmap_resize()
  md/md-bitmap: merge md_bitmap_daemon_work() into bitmap_operations
  md/md-bitmap: merge bitmap_unplug() into bitmap_operations
  md/md-bitmap: merge md_bitmap_unplug_async() into md_bitmap_unplug()
  md/md-bitmap: merge md_bitmap_sync_with_cluster() into bitmap_operations
  md/md-bitmap: merge md_bitmap_cond_end_sync() into bitmap_operations
  md/md-bitmap: merge md_bitmap_close_sync() into bitmap_operations
  md/md-bitmap: merge md_bitmap_end_sync() into bitmap_operations
  md/md-bitmap: remove the parameter 'aborted' for md_bitmap_end_sync()
  md/md-bitmap: merge md_bitmap_start_sync() into bitmap_operations
  md/md-bitmap: merge md_bitmap_endwrite() into bitmap_operations
  md/md-bitmap: merge md_bitmap_startwrite() into bitmap_operations
  ...

Signed-off-by: Song Liu <song@kernel.org>
8 months agomd: Remove flush handling
Yu Kuai [Tue, 27 Aug 2024 11:06:16 +0000 (19:06 +0800)]
md: Remove flush handling

For flush request, md has a special flush handling to merge concurrent
flush request into single one, however, the whole mechanism is based on
a disk level spin_lock 'mddev->lock'. And fsync can be called quite
often in some user cases, for consequence, spin lock from IO fast path can
cause performance degradation.

Fortunately, the block layer already has flush handling to merge
concurrent flush request, and it only acquires hctx level spin lock. (see
details in blk-flush.c)

This patch removes the flush handling in md, and converts to use general
block layer flush handling in underlying disks.

Flush test for 4 nvme raid10:
start 128 threads to do fsync 100000 times, on arm64, see how long it
takes.

Test script:
void* thread_func(void* arg) {
    int fd = *(int*)arg;
    for (int i = 0; i < FSYNC_COUNT; i++) {
        fsync(fd);
    }
    return NULL;
}

int main() {
    int fd = open("/dev/md0", O_RDWR);
    if (fd < 0) {
        perror("open");
        exit(1);
    }

    pthread_t threads[THREADS];
    struct timeval start, end;

    gettimeofday(&start, NULL);

    for (int i = 0; i < THREADS; i++) {
        pthread_create(&threads[i], NULL, thread_func, &fd);
    }

    for (int i = 0; i < THREADS; i++) {
        pthread_join(threads[i], NULL);
    }

    gettimeofday(&end, NULL);

    close(fd);

    long long elapsed = (end.tv_sec - start.tv_sec) * 1000000LL + (end.tv_usec - start.tv_usec);
    printf("Elapsed time: %lld microseconds\n", elapsed);

    return 0;
}

Test result: about 10 times faster:
Before this patch: 50943374 microseconds
After this patch:  5096347  microseconds

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240827110616.3860190-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: make in memory structure internal
Yu Kuai [Mon, 26 Aug 2024 07:44:52 +0000 (15:44 +0800)]
md/md-bitmap: make in memory structure internal

Now that struct bitmap_page and bitmap is not used externally anymore,
move them from md-bitmap.h to md-bitmap.c (expect that dm-raid is still
using define marco 'COUNTER_MAX').

Also fix some checkpatch warnings.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-43-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_enabled() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:51 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_enabled() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-42-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_wait_behind_writes() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:50 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_wait_behind_writes() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-41-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_free() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:49 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_free() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
o invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-40-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_set_pages() into struct bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:48 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_set_pages() into struct bitmap_operations

o that the implementation won't be exposed, and it'll be possible
o invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-39-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_copy_from_slot() into struct bitmap_operation.
Yu Kuai [Mon, 26 Aug 2024 07:44:47 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_copy_from_slot() into struct bitmap_operation.

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-38-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge get_bitmap_from_slot() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:46 +0000 (15:44 +0800)]
md/md-bitmap: merge get_bitmap_from_slot() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-37-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_resize() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:45 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_resize() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-36-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: pass in mddev directly for md_bitmap_resize()
Yu Kuai [Mon, 26 Aug 2024 07:44:44 +0000 (15:44 +0800)]
md/md-bitmap: pass in mddev directly for md_bitmap_resize()

And move the condition "if (mddev->bitmap)" into md_bitmap_resize() as
well, on the one hand make code cleaner, on the other hand try not to
access bitmap directly.

Since we are here, also change the parameter 'init' from int to bool.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-35-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_daemon_work() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:43 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_daemon_work() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-34-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge bitmap_unplug() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:42 +0000 (15:44 +0800)]
md/md-bitmap: merge bitmap_unplug() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-33-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_unplug_async() into md_bitmap_unplug()
Yu Kuai [Mon, 26 Aug 2024 07:44:41 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_unplug_async() into md_bitmap_unplug()

Add a parameter 'bool sync' to distinguish them, and
md_bitmap_unplug_async() won't be exported anymore, hence
bitmap_operations only need one op to cover them.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-32-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_sync_with_cluster() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:40 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_sync_with_cluster() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-31-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_cond_end_sync() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:39 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_cond_end_sync() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-30-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_close_sync() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:38 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_close_sync() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-29-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_end_sync() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:37 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_end_sync() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-28-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: remove the parameter 'aborted' for md_bitmap_end_sync()
Yu Kuai [Mon, 26 Aug 2024 07:44:36 +0000 (15:44 +0800)]
md/md-bitmap: remove the parameter 'aborted' for md_bitmap_end_sync()

For internal callers, aborted are always set to false, while for
external callers, aborted are always set to true.

Hence there is no need to always pass in true for exported api.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-27-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_start_sync() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:35 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_start_sync() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible.

Also fix lots of code style.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-26-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_endwrite() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:34 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_endwrite() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible. And change the type
of 'success' and 'behind' from int to bool.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-25-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_startwrite() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:33 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_startwrite() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible. And change the type
of 'behind' from int to bool.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-24-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_dirty_bits() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:32 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_dirty_bits() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible.

And while we're here, also fix coding style for bitmap_store().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-23-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge bitmap_write_all() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:31 +0000 (15:44 +0800)]
md/md-bitmap: merge bitmap_write_all() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Also change the parameter from bitmap to mddev, to avoid access
bitmap outside md-bitmap.c as much as possible.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-22-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: remove md_bitmap_setallbits()
Yu Kuai [Mon, 26 Aug 2024 07:44:30 +0000 (15:44 +0800)]
md/md-bitmap: remove md_bitmap_setallbits()

md_bitmap_setallbits() is not used, hence can be removed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-21-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_status() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:29 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_status() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-20-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_update_sb() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:28 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_update_sb() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-19-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: make md_bitmap_print_sb() internal
Yu Kuai [Mon, 26 Aug 2024 07:44:27 +0000 (15:44 +0800)]
md/md-bitmap: make md_bitmap_print_sb() internal

md_bitmap_print_sb() is only used inside md-bitmap.c, hence make it
static, also rename it to bitmap_print_sb.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-18-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_flush() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:26 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_flush() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-17-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_destroy() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:25 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_destroy() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-16-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_load() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:24 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_load() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-15-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: merge md_bitmap_create() into bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:23 +0000 (15:44 +0800)]
md/md-bitmap: merge md_bitmap_create() into bitmap_operations

So that the implementation won't be exposed, and it'll be possible
to invent a new bitmap by replacing bitmap_operations.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-14-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: simplify md_bitmap_create() + md_bitmap_load()
Yu Kuai [Mon, 26 Aug 2024 07:44:22 +0000 (15:44 +0800)]
md/md-bitmap: simplify md_bitmap_create() + md_bitmap_load()

Other than internal api get_bitmap_from_slot(), all other places will
set returned bitmap to mddev->bitmap. So move the setting of
mddev->bitmap into md_bitmap_create() to simplify code.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-13-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: introduce struct bitmap_operations
Yu Kuai [Mon, 26 Aug 2024 07:44:21 +0000 (15:44 +0800)]
md/md-bitmap: introduce struct bitmap_operations

The structure is empty for now, and will be used in later patches to
merge in bitmap operations, so that bitmap implementation won't be
exposed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-12-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: add a new helper md_bitmap_set_pages()
Yu Kuai [Mon, 26 Aug 2024 07:44:20 +0000 (15:44 +0800)]
md/md-bitmap: add a new helper md_bitmap_set_pages()

Currently md-cluster will set bitmap->counts.pages directly, add a
helper to do this to avoid dereferencing bitmap directly.

Noted that after this patch bitmap is not dereferenced directly anymore
and following patches will move the structure inside md-bitmap.c.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-11-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-cluster: use helper md_bitmap_get_stats() to get pages in resize_bitmaps()
Yu Kuai [Mon, 26 Aug 2024 07:44:19 +0000 (15:44 +0800)]
md/md-cluster: use helper md_bitmap_get_stats() to get pages in resize_bitmaps()

Use the existed helper instead of open coding it, avoid dereferencing
bitmap directly to prepare inventing a new bitmap.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-10-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: add 'behind_writes' and 'behind_wait' into struct md_bitmap_stats
Yu Kuai [Mon, 26 Aug 2024 07:44:18 +0000 (15:44 +0800)]
md/md-bitmap: add 'behind_writes' and 'behind_wait' into struct md_bitmap_stats

There are no functional changes, avoid dereferencing bitmap directly to
prepare inventing a new bitmap.

Also fix following checkpatch warning by using wq_has_sleeper().

WARNING: waitqueue_active without comment

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-9-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: add 'file_pages' into struct md_bitmap_stats
Yu Kuai [Mon, 26 Aug 2024 07:44:17 +0000 (15:44 +0800)]
md/md-bitmap: add 'file_pages' into struct md_bitmap_stats

There are no functional changes, avoid dereferencing bitmap directly to
prepare inventing a new bitmap.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-8-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: add 'sync_size' into struct md_bitmap_stats
Yu Kuai [Mon, 26 Aug 2024 07:44:16 +0000 (15:44 +0800)]
md/md-bitmap: add 'sync_size' into struct md_bitmap_stats

To avoid dereferencing bitmap directly in md-cluster to prepare
inventing a new bitmap.

BTW, also fix following checkpatch warnings:

WARNING: Deprecated use of 'kmap_atomic', prefer 'kmap_local_page' instead
WARNING: Deprecated use of 'kunmap_atomic', prefer 'kunmap_local' instead

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-7-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-cluster: fix spares warnings for __le64
Yu Kuai [Mon, 26 Aug 2024 07:44:15 +0000 (15:44 +0800)]
md/md-cluster: fix spares warnings for __le64

drivers/md/md-cluster.c:1220:22: warning: incorrect type in assignment (different base types)
drivers/md/md-cluster.c:1220:22:    expected unsigned long my_sync_size
drivers/md/md-cluster.c:1220:22:    got restricted __le64 [usertype] sync_size
drivers/md/md-cluster.c:1252:35: warning: incorrect type in assignment (different base types)
drivers/md/md-cluster.c:1252:35:    expected unsigned long sync_size
drivers/md/md-cluster.c:1252:35:    got restricted __le64 [usertype] sync_size
drivers/md/md-cluster.c:1253:41: warning: restricted __le64 degrades to integer

Fix the warnings by using le64_to_cpu() to convet __le64 to integer.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-6-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: add 'events_cleared' into struct md_bitmap_stats
Yu Kuai [Mon, 26 Aug 2024 07:44:14 +0000 (15:44 +0800)]
md/md-bitmap: add 'events_cleared' into struct md_bitmap_stats

Also add a new helper to get events_cleared to avoid dereferencing
bitmap directly to prepare inventing a new bitmap.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-5-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd: use new helper md_bitmap_get_stats() in update_array_info()
Yu Kuai [Mon, 26 Aug 2024 07:44:13 +0000 (15:44 +0800)]
md: use new helper md_bitmap_get_stats() in update_array_info()

There are no functional changes, avoid dereferencing bitmap directly to
prepare inventing a new bitmap.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-4-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/md-bitmap: replace md_bitmap_status() with a new helper md_bitmap_get_stats()
Yu Kuai [Mon, 26 Aug 2024 07:44:12 +0000 (15:44 +0800)]
md/md-bitmap: replace md_bitmap_status() with a new helper md_bitmap_get_stats()

There are no functional changes, and the new helper will be used in
multiple places in following patches to avoid dereferencing bitmap
directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-3-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/raid1: use md_bitmap_wait_behind_writes() in raid1_read_request()
Yu Kuai [Mon, 26 Aug 2024 07:44:11 +0000 (15:44 +0800)]
md/raid1: use md_bitmap_wait_behind_writes() in raid1_read_request()

Use the existed helper instead of open coding it to make the code cleaner.
There are no functional changes, and also avoid dereferencing bitmap
directly to prepare inventing a new bitmap.

Noted that this patch also export md_bitmap_wait_behind_writes(), which
is necessary for now, and the exported api will be removed in following
patches to convert bitmap apis into ops.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240826074452.1490072-2-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd/raid1: Clean up local variable 'b' from raid1_read_request()
Yu Kuai [Thu, 1 Aug 2024 13:30:08 +0000 (21:30 +0800)]
md/raid1: Clean up local variable 'b' from raid1_read_request()

The local variable will only be used onced, in the error path that
read_balance() failed to find a valid rdev to read. Since now the rdev
is ensured can't be removed from conf while IO is still pending,
remove the local variable and dereference rdev directly.

Since we're here, also remove an extra empty line, and unnecessary
type conversion from sector_t(u64) to unsigned long long.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240801133008.459998-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd: Don't flush sync_work in md_write_start()
Yu Kuai [Thu, 1 Aug 2024 12:47:46 +0000 (20:47 +0800)]
md: Don't flush sync_work in md_write_start()

Because flush sync_work may trigger mddev_suspend() if there are spares,
and this should never be done in IO path because mddev_suspend() is used
to wait for IO.

This problem is found by code review.

Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration")
Cc: stable@vger.kernel.org
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240801124746.242558-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu <song@kernel.org>
8 months agomd: convert comma to semicolon
Chen Ni [Tue, 16 Jul 2024 02:58:52 +0000 (10:58 +0800)]
md: convert comma to semicolon

Replace a comma between expression statements by a semicolon.

Fixes: 5e5702898e93 ("md/raid10: Handle read errors during recovery better.")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Link: https://lore.kernel.org/r/20240716025852.400259-1-nichen@iscas.ac.cn
Signed-off-by: Song Liu <song@kernel.org>
8 months agoblock: constify ext_pi_ref_escape()
Alexey Dobriyan [Mon, 12 Aug 2024 18:12:10 +0000 (21:12 +0300)]
block: constify ext_pi_ref_escape()

This function doesn't mutate data.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/d24611b3-dddf-473a-903d-39290db03b11@p183
Signed-off-by: Jens Axboe <axboe@kernel.dk>
8 months agoblock: delete module stuff from t10-pi
Alexey Dobriyan [Mon, 12 Aug 2024 18:10:33 +0000 (21:10 +0300)]
block: delete module stuff from t10-pi

It is not possible to build t10-pi.ko anymore.

This file doesn't even export functions.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/216ccc79-5b80-47b2-b507-990951aa810c@p183
Signed-off-by: Jens Axboe <axboe@kernel.dk>
8 months agoublk: move zone report data out of request pdu
Ming Lei [Mon, 12 Aug 2024 01:36:24 +0000 (09:36 +0800)]
ublk: move zone report data out of request pdu

ublk zoned takes 16 bytes in each request pdu just for handling REPORT_ZONE
operation, this way does waste memory since request pdu is allocated
statically.

Store the transient zone report data into one global xarray, and remove
it after the report zone request is completed. This way is reasonable
since report zone is run in slow code path.

Fixes: 29802d7ca33b ("ublk: enable zoned storage support")
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240812013624.587587-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
8 months agodrbd: Remove unused extern declarations
YueHaibing [Fri, 2 Aug 2024 09:51:47 +0000 (17:51 +0800)]
drbd: Remove unused extern declarations

Commit b411b3637fa7 ("The DRBD driver") declared but never implemented
drbd_read_remote(), is_valid_ar_handle() and drbd_set_recv_tcq().
And commit 668700b40a7c ("drbd: Create a dedicated workqueue for sending acks on the control connection")
never implemented drbd_send_ping_wf().

Commit 2451fc3b2bd3 ("drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code")
leave w_e_reissue() declaration unused.

Commit 8fe605513ab4 ("drbd: Rename drbdd_init() -> drbd_receiver()")
rename drbdd_init() and leave unsued declaration. Also drbd_asender() is removed in
commit 1c03e52083c8 ("drbd: Rename asender to ack_receiver").

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20240802095147.2788218-1-yuehaibing@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agonbd: add support for rotational devices
Wouter Verhelst [Thu, 25 Jul 2024 16:45:36 +0000 (18:45 +0200)]
nbd: add support for rotational devices

The NBD protocol defines the flag NBD_FLAG_ROTATIONAL to flag that the
export in use should be treated as a rotational device.

Add support for that flag to the kernel driver.

Signed-off-by: Wouter Verhelst <w@uter.be>
Reviewed-by: Eric Blake <eblake@redhat.com>
Link: https://lore.kernel.org/r/20240725164536.1275851-1-w@uter.be
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agodrbd: use sendpages_ok() instead of sendpage_ok()
Ofir Gal [Thu, 18 Jul 2024 08:45:14 +0000 (11:45 +0300)]
drbd: use sendpages_ok() instead of sendpage_ok()

Currently _drbd_send_page() use sendpage_ok() in order to enable
MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator
may represent contiguous pages.

MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
pages it sends with sendpage_ok().

When _drbd_send_page() sends an iterator that the first page is
sendable, but one of the other pages isn't skb_splice_from_iter() warns
and aborts the data transfer.

Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
solves the issue.

Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
Link: https://lore.kernel.org/r/20240718084515.3833733-4-ofir.gal@volumez.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agonvme-tcp: use sendpages_ok() instead of sendpage_ok()
Ofir Gal [Thu, 18 Jul 2024 08:45:13 +0000 (11:45 +0300)]
nvme-tcp: use sendpages_ok() instead of sendpage_ok()

Currently nvme_tcp_try_send_data() use sendpage_ok() in order to disable
MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator
may represent contiguous pages.

MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
pages it sends with sendpage_ok().

When nvme_tcp_try_send_data() sends an iterator that the first page is
sendable, but one of the other pages isn't skb_splice_from_iter() warns
and aborts the data transfer.

Using the new helper sendpages_ok() in order to disable MSG_SPLICE_PAGES
solves the issue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
Link: https://lore.kernel.org/r/20240718084515.3833733-3-ofir.gal@volumez.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agonet: introduce helper sendpages_ok()
Ofir Gal [Thu, 18 Jul 2024 08:45:12 +0000 (11:45 +0300)]
net: introduce helper sendpages_ok()

Network drivers are using sendpage_ok() to check the first page of an
iterator in order to disable MSG_SPLICE_PAGES. The iterator can
represent list of contiguous pages.

When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
it requires all pages in the iterator to be sendable. Therefore it needs
to check that each page is sendable.

The patch introduces a helper sendpages_ok(), it returns true if all the
contiguous pages are sendable.

Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
this helper to check whether the page list is OK. If the helper does not
return true, the driver should remove MSG_SPLICE_PAGES flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240718084515.3833733-2-ofir.gal@volumez.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agoblk-ioprio: remove per-disk structure
Yu Kuai [Fri, 19 Jul 2024 07:15:06 +0000 (15:15 +0800)]
blk-ioprio: remove per-disk structure

ioprio works on the blk-cgroup level, all disks in the same cgroup
are the same, and the struct ioprio_blkg doesn't have anything in it.
Hence register the policy is enough, because cpd_alloc/free_fn will
be handled for each blk-cgroup, and there is no need to activate the
policy for disk. Hence remove blk_ioprio_init/exit and
ioprio_alloc/free_pd.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240719071506.158075-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agoblk-ioprio: remove ioprio_blkcg_from_bio()
Yu Kuai [Fri, 19 Jul 2024 07:15:05 +0000 (15:15 +0800)]
blk-ioprio: remove ioprio_blkcg_from_bio()

Currently, if config is enabled, then ioprio is always enabled by
default from blkcg_init_disk(), hence there is no point to check if
the policy is enabled from blkg in ioprio_blkcg_from_bio(). Hence remove
it and get blkcg directly from bio.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240719071506.158075-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agoblk-cgroup: check for pd_(alloc|free)_fn in blkcg_activate_policy()
Yu Kuai [Fri, 19 Jul 2024 07:15:04 +0000 (15:15 +0800)]
blk-cgroup: check for pd_(alloc|free)_fn in blkcg_activate_policy()

Currently all policies implement pd_(alloc|free)_fn, however, this is
not necessary for ioprio that only works for blkcg, not blkg.

There are no functional changes, prepare to cleanup activating ioprio
policy.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240719071506.158075-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 months agoLinux 6.11-rc1
Linus Torvalds [Sun, 28 Jul 2024 21:19:55 +0000 (14:19 -0700)]
Linux 6.11-rc1

9 months agoMerge tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masah...
Linus Torvalds [Sun, 28 Jul 2024 21:02:48 +0000 (14:02 -0700)]
Merge tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix RPM package build error caused by an incorrect locale setup

 - Mark modules.weakdep as ghost in RPM package

 - Fix the odd combination of -S and -c in stack protector scripts,
   which is an error with the latest Clang

* tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: Fix '-S -c' in x86 stack protector scripts
  kbuild: rpm-pkg: ghost modules.weakdep file
  kbuild: rpm-pkg: Fix C locale setup

9 months agominmax: simplify and clarify min_t()/max_t() implementation
Linus Torvalds [Sun, 28 Jul 2024 20:50:01 +0000 (13:50 -0700)]
minmax: simplify and clarify min_t()/max_t() implementation

This simplifies the min_t() and max_t() macros by no longer making them
work in the context of a C constant expression.

That means that you can no longer use them for static initializers or
for array sizes in type definitions, but there were only a couple of
such uses, and all of them were converted (famous last words) to use
MIN_T/MAX_T instead.

Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 months agominmax: add a few more MIN_T/MAX_T users
Linus Torvalds [Sun, 28 Jul 2024 20:03:48 +0000 (13:03 -0700)]
minmax: add a few more MIN_T/MAX_T users

Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant
expressions in VM code") added the simpler MIN_T/MAX_T macros in order
to avoid some excessive expansion from the rather complicated regular
min/max macros.

The complexity of those macros stems from two issues:

 (a) trying to use them in situations that require a C constant
     expression (in static initializers and for array sizes)

 (b) the type sanity checking

and MIN_T/MAX_T avoids both of these issues.

Now, in the whole (long) discussion about all this, it was pointed out
that the whole type sanity checking is entirely unnecessary for
min_t/max_t which get a fixed type that the comparison is done in.

But that still leaves min_t/max_t unnecessarily complicated due to
worries about the C constant expression case.

However, it turns out that there really aren't very many cases that use
min_t/max_t for this, and we can just force-convert those.

This does exactly that.

Which in turn will then allow for much simpler implementations of
min_t()/max_t().  All the usual "macros in all upper case will evaluate
the arguments multiple times" rules apply.

We should do all the same things for the regular min/max() vs MIN/MAX()
cases, but that has the added complexity of various drivers defining
their own local versions of MIN/MAX, so that needs another level of
fixes first.

Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 months agoMerge tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 28 Jul 2024 18:51:51 +0000 (11:51 -0700)]
Merge tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI and UBIFS updates from Richard Weinberger:

 - Many fixes for power-cut issues by Zhihao Cheng

 - Another ubiblock error path fix

 - ubiblock section mismatch fix

 - Misc fixes all over the place

* tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: Fix ubi_init() ubiblock_exit() section mismatch
  ubifs: add check for crypto_shash_tfm_digest
  ubifs: Fix inconsistent inode size when powercut happens during appendant writing
  ubi: block: fix null-pointer-dereference in ubiblock_create()
  ubifs: fix kernel-doc warnings
  ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarity
  mtd: ubi: Restore missing cleanup on ubi_init() failure path
  ubifs: dbg_orphan_check: Fix missed key type checking
  ubifs: Fix unattached inode when powercut happens in creating
  ubifs: Fix space leak when powercut happens in linking tmpfile
  ubifs: Move ui->data initialization after initializing security
  ubifs: Fix adding orphan entry twice for the same inode
  ubifs: Remove insert_dead_orphan from replaying orphan process
  Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path"
  ubifs: Don't add xattr inode into orphan area
  ubifs: Fix unattached xattr inode if powercut happens after deleting
  mtd: ubi: avoid expensive do_div() on 32-bit machines
  mtd: ubi: make ubi_class constant
  ubi: eba: properly rollback inside self_check_eba

9 months agokbuild: Fix '-S -c' in x86 stack protector scripts
Nathan Chancellor [Fri, 26 Jul 2024 18:05:00 +0000 (11:05 -0700)]
kbuild: Fix '-S -c' in x86 stack protector scripts

After a recent change in clang to stop consuming all instances of '-S'
and '-c' [1], the stack protector scripts break due to the kernel's use
of -Werror=unused-command-line-argument to catch cases where flags are
not being properly consumed by the compiler driver:

  $ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument
  clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument]

This results in CONFIG_STACKPROTECTOR getting disabled because
CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set.

'-c' and '-S' both instruct the compiler to stop at different stages of
the pipeline ('-S' after compiling, '-c' after assembling), so having
them present together in the same command makes little sense. In this
case, the test wants to stop before assembling because it is looking at
the textual assembly output of the compiler for either '%fs' or '%gs',
so remove '-c' from the list of arguments to resolve the error.

All versions of GCC continue to work after this change, along with
versions of clang that do or do not contain the change mentioned above.

Cc: stable@vger.kernel.org
Fixes: 4f7fd4d7a791 ("[PATCH] Add the -fstack-protector option to the CFLAGS")
Fixes: 60a5317ff0f4 ("x86: implement x86_32 stack protector")
Link: https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353cf5600e9c
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agoubi: Fix ubi_init() ubiblock_exit() section mismatch
Richard Weinberger [Sat, 13 Jul 2024 07:35:19 +0000 (09:35 +0200)]
ubi: Fix ubi_init() ubiblock_exit() section mismatch

Since ubiblock_exit() is now called from an init function,
the __exit section no longer makes sense.

Cc: Ben Hutchings <bwh@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407131403.wZJpd8n2-lkp@intel.com/
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
9 months agoMerge tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Linus Torvalds [Sun, 28 Jul 2024 17:52:15 +0000 (10:52 -0700)]
Merge tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

 - Enable turbostat extensions to add both perf and PMT (Intel
   Platform Monitoring Technology) counters via the cmdline

 - Demonstrate PMT access with built-in support for Meteor Lake's
   Die C6 counter

* tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2024.07.26
  tools/power turbostat: Include umask=%x in perf counter's config
  tools/power turbostat: Document PMT in turbostat.8
  tools/power turbostat: Add MTL's PMT DC6 builtin counter
  tools/power turbostat: Add early support for PMT counters
  tools/power turbostat: Add selftests for added perf counters
  tools/power turbostat: Add selftests for SMI, APERF and MPERF counters
  tools/power turbostat: Move verbose counter messages to level 2
  tools/power turbostat: Move debug prints from stdout to stderr
  tools/power turbostat: Fix typo in turbostat.8
  tools/power turbostat: Add perf added counter example to turbostat.8
  tools/power turbostat: Fix formatting in turbostat.8
  tools/power turbostat: Extend --add option with perf counters
  tools/power turbostat: Group SMI counter with APERF and MPERF
  tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array
  tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source
  tools/power turbostat: Remove anonymous union from rapl_counter_info_t
  tools/power/turbostat: Switch to new Intel CPU model defines

9 months agoMerge tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Linus Torvalds [Sun, 28 Jul 2024 16:33:28 +0000 (09:33 -0700)]
Merge tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull CXL updates from Dave Jiang:
 "Core:

   - A CXL maturity map has been added to the documentation to detail
     the current state of CXL enabling.

     It provides the status of the current state of various CXL features
     to inform current and future contributors of where things are and
     which areas need contribution.

   - A notifier handler has been added in order for a newly created CXL
     memory region to trigger the abstract distance metrics calculation.

     This should bring parity for CXL memory to the same level vs
     hotplugged DRAM for NUMA abstract distance calculation. The
     abstract distance reflects relative performance used for memory
     tiering handling.

   - An addition for XOR math has been added to address the CXL DPA to
     SPA translation.

     CXL address translation did not support address interleave math
     with XOR prior to this change.

  Fixes:

   - Fix to address race condition in the CXL memory hotplug notifier

   - Add missing MODULE_DESCRIPTION() for CXL modules

   - Fix incorrect vendor debug UUID define

  Misc:

   - A warning has been added to inform users of an unsupported
     configuration when mixing CXL VH and RCH/RCD hierarchies

   - The ENXIO error code has been replaced with EBUSY for inject poison
     limit reached via debugfs and cxl-test support

   - Moving the PCI config read in cxl_dvsec_rr_decode() to avoid
     unnecessary PCI config reads

   - A refactor to a common struct for DRAM and general media CXL
     events"

* tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/core/pci: Move reading of control register to immediately before usage
  cxl: Remove defunct code calculating host bridge target positions
  cxl/region: Verify target positions using the ordered target list
  cxl: Restore XOR'd position bits during address translation
  cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
  cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
  cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
  cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy
  cxl/core: Fix incorrect vendor debug UUID define
  Documentation: CXL Maturity Map
  cxl/region: Simplify cxl_region_nid()
  cxl/region: Support to calculate memory tier abstract distance
  cxl/region: Fix a race condition in memory hotplug notifier
  cxl: add missing MODULE_DESCRIPTION() macros
  cxl/events: Use a common struct for DRAM and General Media events

9 months agoMerge tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisma...
Linus Torvalds [Sun, 28 Jul 2024 16:14:11 +0000 (09:14 -0700)]
Merge tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode update from Gabriel Krisman Bertazi:
 "Two small fixes to silence the compiler and static analyzers tools
  from Ben Dooks and Jeff Johnson"

* tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: add MODULE_DESCRIPTION() macros
  unicode: make utf8 test count static

9 months agokbuild: rpm-pkg: ghost modules.weakdep file
Jose Ignacio Tornos Martinez [Fri, 26 Jul 2024 09:00:26 +0000 (11:00 +0200)]
kbuild: rpm-pkg: ghost modules.weakdep file

In the same way as for other similar files, mark as ghost the new file
generated by depmod for configured weak dependencies for modules,
modules.weakdep, so that although it is not included in the package,
claim the ownership on it.

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
9 months agoMerge tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 28 Jul 2024 03:08:07 +0000 (20:08 -0700)]
Merge tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more smb client updates from Steve French:

 - fix for potential null pointer use in init cifs

 - additional dynamic trace points to improve debugging of some common
   scenarios

 - two SMB1 fixes (one addressing reconnect with POSIX extensions, one a
   mount parsing error)

* tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: add dynamic trace point for session setup key expired failures
  smb3: add four dynamic tracepoints for copy_file_range and reflink
  smb3: add dynamic tracepoint for reflink errors
  cifs: mount with "unix" mount option for SMB1 incorrectly handled
  cifs: fix reconnect with SMB1 UNIX Extensions
  cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path

9 months agoMerge tag 'block-6.11-20240726' of git://git.kernel.dk/linux
Linus Torvalds [Sat, 27 Jul 2024 22:28:53 +0000 (15:28 -0700)]
Merge tag 'block-6.11-20240726' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - Fix request without payloads cleanup  (Leon)
     - Use new protection information format (Francis)
     - Improved debug message for lost pci link (Bart)
     - Another apst quirk (Wang)
     - Use appropriate sysfs api for printing chars (Markus)

 - ublk async device deletion fix (Ming)

 - drbd kerneldoc fixups (Simon)

 - Fix deadlock between sd removal and release (Yang)

* tag 'block-6.11-20240726' of git://git.kernel.dk/linux:
  nvme-pci: add missing condition check for existence of mapped data
  ublk: fix UBLK_CMD_DEL_DEV_ASYNC handling
  block: fix deadlock between sd_remove & sd_release
  drbd: Add peer_device to Kernel doc
  nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE
  nvme-pci: Fix the instructions for disabling power management
  nvme: remove redundant bdev local variable
  nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens()
  nvme/pci: Add APST quirk for Lenovo N60z laptop

9 months agoMerge tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux
Linus Torvalds [Sat, 27 Jul 2024 22:22:33 +0000 (15:22 -0700)]
Merge tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix a syzbot issue for the msg ring cache added in this release. No
   ill effects from this one, but it did make KMSAN unhappy (me)

 - Sanitize the NAPI timeout handling, by unifying the value handling
   into all ktime_t rather than converting back and forth (Pavel)

 - Fail NAPI registration for IOPOLL rings, it's not supported (Pavel)

 - Fix a theoretical issue with ring polling and cancelations (Pavel)

 - Various little cleanups and fixes (Pavel)

* tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux:
  io_uring/napi: pass ktime to io_napi_adjust_timeout
  io_uring/napi: use ktime in busy polling
  io_uring/msg_ring: fix uninitialized use of target_req->flags
  io_uring: align iowq and task request error handling
  io_uring: kill REQ_F_CANCEL_SEQ
  io_uring: simplify io_uring_cmd return
  io_uring: fix io_match_task must_hold
  io_uring: don't allow netpolling with SETUP_IOPOLL
  io_uring: tighten task exit cancellations

9 months agoMerge tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 27 Jul 2024 22:11:59 +0000 (15:11 -0700)]
Merge tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "This contains two fixes for this merge window:

  VFS:

   - I noticed that it is possible for a privileged user to mount most
     filesystems with a non-initial user namespace in sb->s_user_ns.

     When fsopen() is called in a non-init namespace the caller's
     namespace is recorded in fs_context->user_ns. If the returned file
     descriptor is then passed to a process privileged in init_user_ns,
     that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE*),
     creating a new superblock with sb->s_user_ns set to the namespace
     of the process which called fsopen().

     This is problematic as only filesystems that raise FS_USERNS_MOUNT
     are known to be able to support a non-initial s_user_ns. Others may
     suffer security issues, on-disk corruption or outright crash the
     kernel. Prevent that by restricting such delegation to filesystems
     that allow FS_USERNS_MOUNT.

     Note, that this delegation requires a privileged process to
     actually create the superblock so either the privileged process is
     cooperaing or someone must have tricked a privileged process into
     operating on a fscontext file descriptor whose origin it doesn't
     know (a stupid idea).

     The bug dates back to about 5 years afaict.

  Misc:

   - Fix hostfs parsing when the mount request comes in via the legacy
     mount api.

     In the legacy mount api hostfs allows to specify the host directory
     mount without any key.

     Restore that behavior"

* tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  hostfs: fix the host directory parse when mounting.
  fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT

9 months agoMerge tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux
Linus Torvalds [Sat, 27 Jul 2024 20:44:54 +0000 (13:44 -0700)]
Merge tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux

Pull Rust updates from Miguel Ojeda:
 "The highlight is the establishment of a minimum version for the Rust
  toolchain, including 'rustc' (and bundled tools) and 'bindgen'.

  The initial minimum will be the pinned version we currently have, i.e.
  we are just widening the allowed versions. That covers three stable
  Rust releases: 1.78.0, 1.79.0, 1.80.0 (getting released tomorrow),
  plus beta, plus nightly.

  This should already be enough for kernel developers in distributions
  that provide recent Rust compiler versions routinely, such as Arch
  Linux, Debian Unstable (outside the freeze period), Fedora Linux,
  Gentoo Linux (especially the testing channel), Nix (unstable) and
  openSUSE Slowroll and Tumbleweed.

  In addition, the kernel is now being built-tested by Rust's pre-merge
  CI. That is, every change that is attempting to land into the Rust
  compiler is tested against the kernel, and it is merged only if it
  passes. Similarly, the bindgen tool has agreed to build the kernel in
  their CI too.

  Thus, with the pre-merge CI in place, both projects hope to avoid
  unintentional changes to Rust that break the kernel. This means that,
  in general, apart from intentional changes on their side (that we will
  need to workaround conditionally on our side), the upcoming Rust
  compiler versions should generally work.

  In addition, the Rust project has proposed getting the kernel into
  stable Rust (at least solving the main blockers) as one of its three
  flagship goals for 2024H2 [1].

  I would like to thank Niko, Sid, Emilio et al. for their help
  promoting the collaboration between Rust and the kernel.

  Toolchain and infrastructure:

   - Support several Rust toolchain versions.

   - Support several bindgen versions.

   - Remove 'cargo' requirement and simplify 'rusttest', thanks to
     'alloc' having been dropped last cycle.

   - Provide proper error reporting for the 'rust-analyzer' target.

  'kernel' crate:

   - Add 'uaccess' module with a safe userspace pointers abstraction.

   - Add 'page' module with a 'struct page' abstraction.

   - Support more complex generics in workqueue's 'impl_has_work!'
     macro.

  'macros' crate:

   - Add 'firmware' field support to the 'module!' macro.

   - Improve 'module!' macro documentation.

  Documentation:

   - Provide instructions on what packages should be installed to build
     the kernel in some popular Linux distributions.

   - Introduce the new kernel.org LLVM+Rust toolchains.

   - Explain '#[no_std]'.

  And a few other small bits"

Link: https://rust-lang.github.io/rust-project-goals/2024h2/index.html#flagship-goals
* tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux: (26 commits)
  docs: rust: quick-start: add section on Linux distributions
  rust: warn about `bindgen` versions 0.66.0 and 0.66.1
  rust: start supporting several `bindgen` versions
  rust: work around `bindgen` 0.69.0 issue
  rust: avoid assuming a particular `bindgen` build
  rust: start supporting several compiler versions
  rust: simplify Clippy warning flags set
  rust: relax most deny-level lints to warnings
  rust: allow `dead_code` for never constructed bindings
  rust: init: simplify from `map_err` to `inspect_err`
  rust: macros: indent list item in `paste!`'s docs
  rust: add abstraction for `struct page`
  rust: uaccess: add typed accessors for userspace pointers
  uaccess: always export _copy_[from|to]_user with CONFIG_RUST
  rust: uaccess: add userspace pointers
  kbuild: rust-analyzer: improve comment documentation
  kbuild: rust-analyzer: better error handling
  docs: rust: no_std is used
  rust: alloc: add __GFP_HIGHMEM flag
  rust: alloc: fix typo in docs for GFP_NOWAIT
  ...

9 months agoMerge tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 27 Jul 2024 20:28:39 +0000 (13:28 -0700)]
Merge tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor updates from John Johansen:
 "Cleanups
   - optimization: try to avoid refing the label in apparmor_file_open
   - remove useless static inline function is_deleted
   - use kvfree_sensitive to free data->data
   - fix typo in kernel doc

  Bug fixes:
   - unpack transition table if dfa is not present
   - test: add MODULE_DESCRIPTION()
   - take nosymfollow flag into account
   - fix possible NULL pointer dereference
   - fix null pointer deref when receiving skb during sock creation"

* tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: unpack transition table if dfa is not present
  apparmor: try to avoid refing the label in apparmor_file_open
  apparmor: test: add MODULE_DESCRIPTION()
  apparmor: take nosymfollow flag into account
  apparmor: fix possible NULL pointer dereference
  apparmor: fix typo in kernel doc
  apparmor: remove useless static inline function is_deleted
  apparmor: use kvfree_sensitive to free data->data
  apparmor: Fix null pointer deref when receiving skb during sock creation

9 months agoMerge tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Sat, 27 Jul 2024 20:16:53 +0000 (13:16 -0700)]
Merge tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock fix from Mickaël Salaün:
 "Jann Horn reported a sandbox bypass for Landlock. This includes the
  fix and new tests. This should be backported"

* tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Add cred_transfer test
  landlock: Don't lose track of restrictions on cred_transfer

9 months agoMerge tag 'gpio-fixes-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Jul 2024 19:54:06 +0000 (12:54 -0700)]
Merge tag 'gpio-fixes-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:

 - don't use sprintf() with non-constant format string

* tag 'gpio-fixes-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: virtuser: avoid non-constant format string

9 months agoMerge tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Jul 2024 19:46:16 +0000 (12:46 -0700)]
Merge tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull more devicetree updates from Rob Herring:
 "Most of this is a treewide change to of_property_for_each_u32() which
  was small enough to do in one go before rc1 and avoids the need to
  create of_property_for_each_u32_some_new_name().

   - Treewide conversion of of_property_for_each_u32() to drop internal
     arguments making struct property opaque

   - Add binding for Amlogic A4 SoC watchdog

   - Fix constraints for AD7192 'single-channel' property"

* tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: iio: adc: ad7192: Fix 'single-channel' constraints
  of: remove internal arguments from of_property_for_each_u32()
  dt-bindings: watchdog: add support for Amlogic A4 SoCs

9 months agoMerge tag 'iommu-fixes-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 27 Jul 2024 19:39:55 +0000 (12:39 -0700)]
Merge tag 'iommu-fixes-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Will Deacon:
 "We're still resolving a regression with the handling of unexpected
  page faults on SMMUv3, but we're not quite there with a fix yet.

   - Fix NULL dereference when freeing domain in Unisoc SPRD driver

   - Separate assignment statements with semicolons in AMD page-table
     code

   - Fix Tegra erratum workaround when the CPU is using 16KiB pages"

* tag 'iommu-fixes-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu: arm-smmu: Fix Tegra workaround for PAGE_SIZE mappings
  iommu/amd: Convert comma to semicolon
  iommu: sprd: Avoid NULL deref in sprd_iommu_hw_en

9 months agoMerge tag 'firewire-fixes-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Jul 2024 19:35:12 +0000 (12:35 -0700)]
Merge tag 'firewire-fixes-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fixes from Takashi Sakamoto:
 "The recent integration of compiler collections introduced the
  technology to check flexible array length at runtime by providing
  proper annotations. In v6.10 kernel, a patch was merged into firewire
  subsystem to utilize it, however the annotation was inadequate.

  There is also the related change for the flexible array in sound
  subsystem, but it causes a regression where the data in the payload of
  isochronous packet is incorrect for some devices. These bugs are now
  fixed"

* tag 'firewire-fixes-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  ALSA: firewire-lib: fix wrong value as length of header for CIP_NO_HEADER case
  Revert "firewire: Annotate struct fw_iso_packet with __counted_by()"

9 months agoMerge tag 'spi-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Jul 2024 19:29:10 +0000 (12:29 -0700)]
Merge tag 'spi-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "The bulk of this is a series of fixes for the microchip-core driver
  mostly originating from one of their customers, I also applied an
  additional patch adding support for controlling the word size which
  came along with it since it's still the merge window and clearly had a
  bunch of fairly thorough testing.

  We also have a fix for the compatible used to bind spidev to the
  BH2228FV"

* tag 'spi-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spidev: add correct compatible for Rohm BH2228FV
  dt-bindings: trivial-devices: fix Rohm BH2228FV compatible string
  spi: microchip-core: add support for word sizes of 1 to 32 bits
  spi: microchip-core: ensure TX and RX FIFOs are empty at start of a transfer
  spi: microchip-core: fix init function not setting the master and motorola modes
  spi: microchip-core: only disable SPI controller when register value change requires it
  spi: microchip-core: defer asserting chip select until just before write to TX FIFO
  spi: microchip-core: fix the issues in the isr

9 months agoMerge tag 'regulator-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 27 Jul 2024 19:27:52 +0000 (12:27 -0700)]
Merge tag 'regulator-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "These two commits clean up the excessively loose dependencies for the
  RZG2L USB VBCTRL regulator driver, ensuring it shouldn't prompt for
  people who can't use it"

* tag 'regulator-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Further restrict RZG2L USB VBCTRL regulator dependencies
  regulator: renesas-usb-vbus-regulator: Update the default

9 months agoMerge tag 'regmap-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Sat, 27 Jul 2024 19:26:09 +0000 (12:26 -0700)]
Merge tag 'regmap-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "Arnd sent a workaround for a false positive warning which was showing
  up with GCC 14.1"

* tag 'regmap-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: maple: work around gcc-14.1 false-positive warning

9 months agoMerge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Linus Torvalds [Sat, 27 Jul 2024 19:07:18 +0000 (12:07 -0700)]
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A few clk driver fixes for the merge window to fix the build and boot
  on some SoCs.

   - Initialize struct clk_init_data in the TI da8xx-cfgchip driver so
     that stack contents aren't used for things like clk flags leading
     to unexpected behavior

   - Don't leak stack contents in a debug print in the new Sophgo clk
     driver

   - Disable the new T-Head clk driver on 32-bit targets to fix the
     build due to a division

   - Fix Samsung Exynos4 fin_pll wreckage from the clkdev rework done
     last cycle by using a struct clk_hw directly instead of a struct
     clk consumer"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: samsung: fix getting Exynos4 fin_pll rate from external clocks
  clk: T-Head: Disable on 32-bit Targets
  clk: sophgo: clk-sg2042-pll: Fix uninitialized variable in debug output
  clk: davinci: da8xx-cfgchip: Initialize clk_init_data before use

9 months agoMerge tag 'i3c/for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Linus Torvalds [Sat, 27 Jul 2024 17:53:06 +0000 (10:53 -0700)]
Merge tag 'i3c/for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
 "This cycle, there are new features for the Designware controller and
  fixes for the other IPs:

   - dw: optional apb clock and power management support, IBI handling
     fixes

   - mipi-i3c-hci: IBI handling fixes

   - svc: a few fixes"

* tag 'i3c/for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  dt-bindings: i3c: add header for generic I3C flags
  i3c: master: svc: Fix error code in svc_i3c_master_do_daa_locked()
  i3c: master: Enhance i3c_bus_type visibility for device searching & event monitoring
  i3c: dw: Add power management support
  i3c: dw: Add some functions for reusability
  i3c: dw: Save timing registers and other values
  i3c: master: svc: Improve DAA STOP handle code logic
  i3c: dw: Add optional apb clock
  i3c: dw: Use new *_enabled clk API
  dt-bindings: i3c: dw: Add apb clock binding
  i3c: master: svc: Convert comma to semicolon
  i3c: mipi-i3c-hci: Round IBI data chunk size to HW supported value
  i3c: mipi-i3c-hci: Error out instead on BUG_ON() in IBI DMA setup
  i3c: mipi-i3c-hci: Set IBI Status and Data Ring base addresses
  i3c: mipi-i3c-hci: Switch to lower_32_bits()/upper_32_bits() helpers
  i3c: dw: Remove ibi_capable property
  i3c: dw: Fix IBI intr programming
  i3c: dw: Fix clearing queue thld
  i3c: mipi-i3c-hci: Fix number of DAT/DCT entries for HCI versions < 1.1
  i3c: master: svc: resend target address when get NACK

9 months agoMerge tag 'thermal-6.11-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafae...
Linus Torvalds [Sat, 27 Jul 2024 17:44:49 +0000 (10:44 -0700)]
Merge tag 'thermal-6.11-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
 "Prevent the thermal core from flooding the kernel log with useless
  messages if thermal zone temperature can never be determined (or its
  sensor has failed permanently) and make it finally give up and disable
  defective thermal zones (Rafael Wysocki)"

* tag 'thermal-6.11-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Back off when polling thermal zones on errors
  thermal: trip: Split thermal_zone_device_set_mode()

9 months agoMerge tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 27 Jul 2024 17:26:41 +0000 (10:26 -0700)]
Merge tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
 "11 hotfixes, 7 of which are cc:stable.  7 are MM, 4 are other"

* tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: handle inconsistent state in nilfs_btnode_create_block()
  selftests/mm: skip test for non-LPA2 and non-LVA systems
  mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
  mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node
  alloc_tag: outline and export free_reserved_page()
  decompress_bunzip2: fix rare decompression failure
  mm/huge_memory: avoid PMD-size page cache if needed
  mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines
  mm: fix old/young bit handling in the faulting path
  dt-bindings: arm: update James Clark's email address
  MAINTAINERS: mailmap: update James Clark's email address

9 months agoMerge tag 'timers-urgent-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Jul 2024 17:19:55 +0000 (10:19 -0700)]
Merge tag 'timers-urgent-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer migration updates from Thomas Gleixner:
 "Fixes and minor updates for the timer migration code:

   - Stop testing the group->parent pointer as it is not guaranteed to
     be stable over a chain of operations by design.

     This includes a warning which would be nice to have but it produces
     false positives due to the racy nature of the check.

   - Plug a race between CPUs going in and out of idle and a CPU hotplug
     operation. The latter can create and connect a new hierarchy level
     which is missed in the concurrent updates of CPUs which go into
     idle. As a result the events of such a CPU might not be processed
     and timers go stale.

     Cure it by splitting the hotplug operation into a prepare and
     online callback. The prepare callback is guaranteed to run on an
     online and therefore active CPU. This CPU updates the hierarchy and
     being online ensures that there is always at least one migrator
     active which handles the modified hierarchy correctly when going
     idle. The online callback which runs on the incoming CPU then just
     marks the CPU active and brings it into operation.

   - Improve tracing and polish the code further so it is more obvious
     what's going on"

* tag 'timers-urgent-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/migration: Fix grammar in comment
  timers/migration: Spare write when nothing changed
  timers/migration: Rename childmask by groupmask to make naming more obvious
  timers/migration: Read childmask and parent pointer in a single place
  timers/migration: Use a single struct for hierarchy walk data
  timers/migration: Improve tracing
  timers/migration: Move hierarchy setup into cpuhotplug prepare callback
  timers/migration: Do not rely always on group->parent

9 months agoMerge tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Jul 2024 17:14:34 +0000 (10:14 -0700)]
Merge tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and
   cache info (via PPTT) on ACPI-based systems.

 - The trap entry/exit code no longer breaks the return address stack
   predictor on many systems, which results in an improvement to trap
   latency.

 - Support for HAVE_ARCH_STACKLEAK.

 - The sv39 linear map has been extended to support 128GiB mappings.

 - The frequency of the mtime CSR is now visible via hwprobe.

* tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
  RISC-V: Provide the frequency of time CSR via hwprobe
  riscv: Extend sv39 linear mapping max size to 128G
  riscv: enable HAVE_ARCH_STACKLEAK
  riscv: signal: Remove unlikely() from WARN_ON() condition
  riscv: Improve exception and system call latency
  RISC-V: Select ACPI PPTT drivers
  riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT
  riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init()
  RISC-V: ACPI: Enable SPCR table for console output on RISC-V
  riscv: boot: remove duplicated targets line
  trace: riscv: Remove deprecated kprobe on ftrace support
  riscv: cpufeature: Extract common elements from extension checking
  riscv: Introduce vendor variants of extension helpers
  riscv: Add vendor extensions to /proc/cpuinfo
  riscv: Extend cpufeature.c to detect vendor extensions
  RISC-V: run savedefconfig for defconfig
  RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically
  ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init
  ACPI: NUMA: change the ACPI_NUMA to a hidden option
  ACPI: NUMA: Add handler for SRAT RINTC affinity structure
  ...

9 months agoMerge tag 'for-linus-6.11-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Jul 2024 16:58:24 +0000 (09:58 -0700)]
Merge tag 'for-linus-6.11-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two fixes for issues introduced in this merge window:

   - fix enhanced debugging in the Xen multicall handling

   - two patches fixing a boot failure when running as dom0 in PVH mode"

* tag 'for-linus-6.11-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: fix memblock_reserve() usage on PVH
  x86/xen: move xen_reserve_extra_memory()
  xen: fix multicall debug data referencing

9 months agohostfs: fix the host directory parse when mounting.
Hongbo Li [Thu, 25 Jul 2024 06:51:30 +0000 (14:51 +0800)]
hostfs: fix the host directory parse when mounting.

hostfs not keep the host directory when mounting. When the host
directory is none (default), fc->source is used as the host root
directory, and this is wrong. Here we use `parse_monolithic` to
handle the old mount path for parsing the root directory. For new
mount path, The `parse_param` is used for the host directory parse.

Reported-and-tested-by: Maciej Żenczykowski <maze@google.com>
Fixes: cd140ce9f611 ("hostfs: convert hostfs to use the new mount API")
Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20240725065130.1821964-1-lihongbo22@huawei.com
[brauner: minor fixes]
Signed-off-by: Christian Brauner <brauner@kernel.org>
9 months agofs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT
Seth Forshee (DigitalOcean) [Wed, 24 Jul 2024 14:53:59 +0000 (09:53 -0500)]
fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT

Christian noticed that it is possible for a privileged user to mount
most filesystems with a non-initial user namespace in sb->s_user_ns.
When fsopen() is called in a non-init namespace the caller's namespace
is recorded in fs_context->user_ns. If the returned file descriptor is
then passed to a process priviliged in init_user_ns, that process can
call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
with sb->s_user_ns set to the namespace of the process which called
fsopen().

This is problematic. We cannot assume that any filesystem which does not
set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in
mind, increasing the risk for bugs and security issues.

Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is
not set for the filesystem and a non-initial user namespace will be
used. sget() does not need to be updated as it always uses the user
namespace of the current context, or the initial user namespace if
SB_SUBMOUNT is set.

Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()")
Reported-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Link: https://lore.kernel.org/r/20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.org
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
9 months agoALSA: firewire-lib: fix wrong value as length of header for CIP_NO_HEADER case
Takashi Sakamoto [Thu, 25 Jul 2024 15:56:40 +0000 (00:56 +0900)]
ALSA: firewire-lib: fix wrong value as length of header for CIP_NO_HEADER case

In a commit 1d717123bb1a ("ALSA: firewire-lib: Avoid
-Wflex-array-member-not-at-end warning"), DEFINE_FLEX() macro was used to
handle variable length of array for header field in struct fw_iso_packet
structure. The usage of macro has a side effect that the designated
initializer assigns the count of array to the given field. Therefore
CIP_HEADER_QUADLETS (=2) is assigned to struct fw_iso_packet.header,
while the original designated initializer assigns zero to all fields.

With CIP_NO_HEADER flag, the change causes invalid length of header in
isochronous packet for 1394 OHCI IT context. This bug affects all of
devices supported by ALSA fireface driver; RME Fireface 400, 800, UCX, UFX,
and 802.

This commit fixes the bug by replacing it with the alternative version of
macro which corresponds no initializer.

Cc: stable@vger.kernel.org
Fixes: 1d717123bb1a ("ALSA: firewire-lib: Avoid -Wflex-array-member-not-at-end warning")
Reported-by: Edmund Raile <edmund.raile@proton.me>
Closes: https://lore.kernel.org/r/rrufondjeynlkx2lniot26ablsltnynfaq2gnqvbiso7ds32il@qk4r6xps7jh2/
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20240725155640.128442-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
9 months agoRevert "firewire: Annotate struct fw_iso_packet with __counted_by()"
Takashi Sakamoto [Thu, 25 Jul 2024 16:16:48 +0000 (01:16 +0900)]
Revert "firewire: Annotate struct fw_iso_packet with __counted_by()"

This reverts commit d3155742db89df3b3c96da383c400e6ff4d23c25.

The header_length field is byte unit, thus it can not express the number of
elements in header field. It seems that the argument for counted_by
attribute can have no arithmetic expression, therefore this commit just
reverts the issued commit.

Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240725161648.130404-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>