]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
15 months agokconfig: remove P_CHOICE property
Masahiro Yamada [Tue, 18 Jun 2024 10:35:30 +0000 (19:35 +0900)]
kconfig: remove P_CHOICE property

P_CHOICE is a pseudo property used to link a choice with its members.

There is no more code relying on this, except for some debug code.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: use sym_get_choice_menu() in sym_check_deps()
Masahiro Yamada [Tue, 18 Jun 2024 10:35:29 +0000 (19:35 +0900)]
kconfig: use sym_get_choice_menu() in sym_check_deps()

Choices and their members are associated via the P_CHOICE property.

Currently, prop_get_symbol(sym_get_choice_prop()) is used to obtain
the choice of the given choice member.

Replace it with sym_get_choice_menu(), which retrieves the choice
without relying on P_CHOICE.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: use sym_get_choice_menu() in sym_check_choice_deps()
Masahiro Yamada [Tue, 18 Jun 2024 10:35:28 +0000 (19:35 +0900)]
kconfig: use sym_get_choice_menu() in sym_check_choice_deps()

Choices and their members are associated via the P_CHOICE property.

Currently, prop_get_symbol(sym_get_choice_prop()) is used to obtain
the choice of the given choice member.

Replace it with sym_get_choice_menu(), which retrieves the choice
without relying on P_CHOICE.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: use sym_get_choice_menu() in sym_check_print_recursive()
Masahiro Yamada [Tue, 18 Jun 2024 10:35:27 +0000 (19:35 +0900)]
kconfig: use sym_get_choice_menu() in sym_check_print_recursive()

Choices and their members are associated via the P_CHOICE property.

Currently, prop_get_symbol(sym_get_choice_prop()) is used to obtain
the choice of the given choice member.

Replace it with sym_get_choice_menu(), which retrieves the choice
without relying on P_CHOICE.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: remove expr_list_for_each_sym() macro
Masahiro Yamada [Tue, 18 Jun 2024 10:35:26 +0000 (19:35 +0900)]
kconfig: remove expr_list_for_each_sym() macro

All users of this macro have been converted. Remove it.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: use menu_list_for_each_sym() in sym_choice_default()
Masahiro Yamada [Tue, 18 Jun 2024 10:35:25 +0000 (19:35 +0900)]
kconfig: use menu_list_for_each_sym() in sym_choice_default()

Choices and their members are associated via the P_CHOICE property.

Currently, sym_get_choice_prop() and expr_list_for_each_sym() are
used to iterate on choice members.

Replace them with menu_for_each_sub_entry(), which achieves the same
without relying on P_CHOICE.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: change sym_choice_default() to take the choice menu
Masahiro Yamada [Tue, 18 Jun 2024 10:35:24 +0000 (19:35 +0900)]
kconfig: change sym_choice_default() to take the choice menu

Change the argument of sym_choice_default() to ease further cleanups.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: remove conf_unsaved in conf_read_simple()
Masahiro Yamada [Tue, 18 Jun 2024 10:35:23 +0000 (19:35 +0900)]
kconfig: remove conf_unsaved in conf_read_simple()

This variable is unnecessary. Call conf_set_changed(true) directly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: remove sym_get_choice_value()
Masahiro Yamada [Tue, 18 Jun 2024 10:35:22 +0000 (19:35 +0900)]
kconfig: remove sym_get_choice_value()

sym_get_choice_value(menu->sym) is equivalent to sym_calc_choice(menu).

Convert all call sites of sym_get_choice_value() and then remove it.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: refactor choice value calculation
Masahiro Yamada [Tue, 18 Jun 2024 10:35:21 +0000 (19:35 +0900)]
kconfig: refactor choice value calculation

Handling choices has always been in a PITA in Kconfig.

For example, fixes and reverts were repeated for randconfig with
KCONFIG_ALLCONFIG:

 - 422c809f03f0 ("kconfig: fix randomising choice entries in presence of KCONFIG_ALLCONFIG")
 - 23a5dfdad22a ("Revert "kconfig: fix randomising choice entries in presence of KCONFIG_ALLCONFIG"")
 - 8357b48549e1 ("kconfig: fix randomising choice entries in presence of KCONFIG_ALLCONFIG")
 - 490f16171119 ("Revert "kconfig: fix randomising choice entries in presence of KCONFIG_ALLCONFIG"")

As these commits pointed out, randconfig does not randomize choices when
KCONFIG_ALLCONFIG is used. This issue still remains.

[Test Case]

    choice
            prompt "choose"

    config A
            bool "A"

    config B
            bool "B"

    endchoice

    $ echo > all.config
    $ make KCONFIG_ALLCONFIG=1 randconfig

The output is always as follows:

    CONFIG_A=y
    # CONFIG_B is not set

Not only randconfig, but other all*config variants are also broken with
KCONFIG_ALLCONFIG.

With the same Kconfig,

    $ echo '# CONFIG_A is not set' > all.config
    $ make KCONFIG_ALLCONFIG=1 allyesconfig

You will get this:

    CONFIG_A=y
    # CONFIG_B is not set

This is incorrect because it does not respect all.config.

The correct output should be:

    # CONFIG_A is not set
    CONFIG_B=y

To handle user inputs more accurately, this commit refactors the code
based on the following principles:

 - When a user value is given, Kconfig must set it immediately.
   Do not defer it by setting SYMBOL_NEED_SET_CHOICE_VALUES.

 - The SYMBOL_DEF_USER flag must not be cleared, unless a new config
   file is loaded. Kconfig must not forget user inputs.

In addition, user values for choices must be managed with priority.
If user inputs conflict within a choice block, the newest value wins.
The values given by randconfig have lower priority than explicit user
inputs.

This commit implements it by using a linked list. Every time a choice
block gets a new input, it is moved to the top of the list.

Let me explain how it works.

Let's say, we have a choice block that consists of five symbols:
A, B, C, D, and E.

Initially, the linked list looks like this:

    A(=?) --> B(=?) --> C(=?) --> D(=?) --> E(=?)

Suppose randconfig is executed with the following KCONFIG_ALLCONFIG:

    CONFIG_C=y
    # CONFIG_A is not set
    CONFIG_D=y

First, CONFIG_C=y is read. C is set to 'y' and moved to the top.

    C(=y) --> A(=?) --> B(=?) --> D(=?) --> E(=?)

Next, '# CONFIG_A is not set' is read. A is set to 'n' and moved to
the top.

    A(=n) --> C(=y) --> B(=?) --> D(=?) --> E(=?)

Then, 'CONFIG_D=y' is read. D is set to 'y' and moved to the top.

    D(=y) --> A(=n) --> C(=y) --> B(=?) --> E(=?)

Lastly, randconfig shuffles the order of the remaining symbols,
resulting in:

    D(=y) --> A(=n) --> C(=y) --> B(=y) --> E(=y)
or
    D(=y) --> A(=n) --> C(=y) --> E(=y) --> B(=y)

When calculating the output, the linked list is traversed and the first
visible symbol with 'y' is taken. In this case, it is D if visible.

If D is hidden by 'depends on', the next node, A, is examined. Since
it is already specified as 'n', it is skipped. Next, C is checked, and
selected if it is visible.

If C is also invisible, either B or E is chosen as a result of the
randomization.

If B and E are also invisible, the linked list is traversed in the
reverse order, and the least prioritized 'n' symbol is chosen. It is
A in this case.

Now, Kconfig remembers all user values. This is a big difference from
the previous implementation, where Kconfig would forget CONFIG_C=y when
CONFIG_D=y appeared in the same input file.

The new appaorch respects user-specified values as much as possible.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: import list_move(_tail) and list_for_each_entry_reverse macros
Masahiro Yamada [Tue, 18 Jun 2024 10:35:20 +0000 (19:35 +0900)]
kconfig: import list_move(_tail) and list_for_each_entry_reverse macros

Import more macros from include/linux/list.h.

These will be used in the next commit.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agoscripts/make_fit: Support decomposing DTBs
Chen-Yu Tsai [Thu, 13 Jun 2024 09:34:32 +0000 (17:34 +0800)]
scripts/make_fit: Support decomposing DTBs

The kernel tree builds some "composite" DTBs, where the final DTB is the
result of applying one or more DTB overlays on top of a base DTB with
fdtoverlay.

The FIT image specification already supports configurations having one
base DTB and overlays applied on top. It is then up to the bootloader to
apply said overlays and either use or pass on the final result. This
allows the FIT image builder to reuse the same FDT images for multiple
configurations, if such cases exist.

The decomposition function depends on the kernel build system, reading
back the .cmd files for the to-be-packaged DTB files to check for the
fdtoverlay command being called. This will not work outside the kernel
tree. The function is off by default to keep compatibility with possible
existing users.

To facilitate the decomposition and keep the code clean, the model and
compatitble string extraction have been moved out of the output_dtb
function. The FDT image description is replaced with the base file name
of the included image.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Simon Glass <sjg@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokbuild: rpm-pkg: make sure to have versioned 'Obsoletes' for kernel.spec
Rafael Aquini [Tue, 11 Jun 2024 21:11:21 +0000 (17:11 -0400)]
kbuild: rpm-pkg: make sure to have versioned 'Obsoletes' for kernel.spec

Fix the following rpmbuild warning:

  $ make srcrpm-pkg
  ...
  RPM build warnings:
      line 34: It's not recommended to have unversioned Obsoletes: Obsoletes: kernel-headers

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agomodpost: Enable section warning from *driver to .exit.text
Uwe Kleine-König [Tue, 11 Jun 2024 20:59:00 +0000 (22:59 +0200)]
modpost: Enable section warning from *driver to .exit.text

There used to be several offenders, but now that for all of them patches
were sent and most of them were applied, enable the warning also for
builds without W=1.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokbuild: move init/build-version to scripts/
Masahiro Yamada [Tue, 11 Jun 2024 18:24:47 +0000 (03:24 +0900)]
kbuild: move init/build-version to scripts/

At first, I thought this script would be needed only in init/Makefile.

However, commit 5db8face97f8 ("kbuild: Restore .version auto-increment
behaviour for Debian packages") and commit 1789fc912541 ("kbuild:
rpm-pkg: invoke the kernel build from rpmbuild for binrpm-pkg")
revealed that it was actually needed for scripts/package/mk* as well.

After all, scripts/ is a better place for it.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
15 months agokconfig: remember the current choice while parsing the choice block
Masahiro Yamada [Tue, 11 Jun 2024 17:55:13 +0000 (02:55 +0900)]
kconfig: remember the current choice while parsing the choice block

Instead of the boolean flag, it will be more useful to remember the
current choice being parsed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: introduce choice_set_value() helper
Masahiro Yamada [Tue, 11 Jun 2024 17:55:12 +0000 (02:55 +0900)]
kconfig: introduce choice_set_value() helper

Currently, sym_set_tristate_value() is used to set 'y' to a choice
member, which is confusing because it not only sets 'y' to the given
symbol but also tweaks flags of other symbols as a side effect.

Add a dedicated function for setting the value of the given choice.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: add fallthrough comments to expr_compare_type()
Masahiro Yamada [Tue, 11 Jun 2024 17:55:11 +0000 (02:55 +0900)]
kconfig: add fallthrough comments to expr_compare_type()

Clarify the missing 'break' is intentional.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: remove unneeded code in expr_compare_type()
Masahiro Yamada [Tue, 11 Jun 2024 17:55:10 +0000 (02:55 +0900)]
kconfig: remove unneeded code in expr_compare_type()

The condition (t2 == 0) never becomes true because the zero value
(i.e., E_NONE) is only used as a dummy type for prevtoken. It can
be passed to t1, but not to t2.

The caller of this function only checks expr_compare_type() > 0.
Therefore, the distinction between 0 and -1 is unnecessary.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: add -e and -u options to *conf-cfg.sh scripts
Masahiro Yamada [Tue, 11 Jun 2024 16:08:05 +0000 (01:08 +0900)]
kconfig: add -e and -u options to *conf-cfg.sh scripts

Set -e to make these scripts fail on the first error.

Set -u because these scripts are invoked by Makefile, and do not work
properly without necessary variables defined.

Both options are described in POSIX. [1]

[1]: https://pubs.opengroup.org/onlinepubs/009604499/utilities/set.html

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
15 months agokbuild: merge temporary vmlinux for BTF and kallsyms
Masahiro Yamada [Mon, 10 Jun 2024 11:25:18 +0000 (20:25 +0900)]
kbuild: merge temporary vmlinux for BTF and kallsyms

CONFIG_DEBUG_INFO_BTF=y requires one additional link step.
(.tmp_vmlinux.btf)

CONFIG_KALLSYMS=y requires two additional link steps.
(.tmp_vmlinux.kallsyms1 and .tmp_vmlinux.kallsyms2)

Enabling both requires three additional link steps.

When CONFIG_DEBUG_INFO_BTF=y and CONFIG_KALLSYMS=y, the current build
process is as follows:

    KSYMS   .tmp_vmlinux.kallsyms0.S
    AS      .tmp_vmlinux.kallsyms0.o
    LD      .tmp_vmlinux.btf             # temporary vmlinux for BTF
    BTF     .btf.vmlinux.bin.o
    LD      .tmp_vmlinux.kallsyms1       # temporary vmlinux for kallsyms step 1
    NM      .tmp_vmlinux.kallsyms1.syms
    KSYMS   .tmp_vmlinux.kallsyms1.S
    AS      .tmp_vmlinux.kallsyms1.o
    LD      .tmp_vmlinux.kallsyms2       # temporary vmlinux for kallsyms step 2
    NM      .tmp_vmlinux.kallsyms2.syms
    KSYMS   .tmp_vmlinux.kallsyms2.S
    AS      .tmp_vmlinux.kallsyms2.o
    LD      vmlinux                      # final vmlinux

This is redundant because the BTF generation and the kallsyms step 1 can
be performed against the same temporary vmlinux.

When both CONFIG_DEBUG_INFO_BTF and CONFIG_KALLSYMS are enabled, we can
reduce the number of link steps by one.

This commit changes the build process as follows:

    KSYMS   .tmp_vmlinux0.kallsyms.S
    AS      .tmp_vmlinux0.kallsyms.o
    LD      .tmp_vmlinux1                # temporary vmlinux for BTF and kallsyms step 1
    BTF     .tmp_vmlinux1.btf.o
    NM      .tmp_vmlinux1.syms
    KSYMS   .tmp_vmlinux1.kallsyms.S
    AS      .tmp_vmlinux1.kallsyms.o
    LD      .tmp_vmlinux2                # temporary vmlinux for kallsyms step 2
    NM      .tmp_vmlinux2.syms
    KSYMS   .tmp_vmlinux2.kallsyms.S
    AS      .tmp_vmlinux2.kallsyms.o
    LD      vmlinux                      # final vmlinux

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
15 months agokbuild: remove PROVIDE() for kallsyms symbols
Masahiro Yamada [Mon, 10 Jun 2024 11:25:17 +0000 (20:25 +0900)]
kbuild: remove PROVIDE() for kallsyms symbols

This reimplements commit 951bcae6c5a0 ("kallsyms: Avoid weak references
for kallsyms symbols") because I am not a big fan of PROVIDE().

As an alternative solution, this commit prepends one more kallsyms step.

    KSYMS   .tmp_vmlinux.kallsyms0.S          # added
    AS      .tmp_vmlinux.kallsyms0.o          # added
    LD      .tmp_vmlinux.btf
    BTF     .btf.vmlinux.bin.o
    LD      .tmp_vmlinux.kallsyms1
    NM      .tmp_vmlinux.kallsyms1.syms
    KSYMS   .tmp_vmlinux.kallsyms1.S
    AS      .tmp_vmlinux.kallsyms1.o
    LD      .tmp_vmlinux.kallsyms2
    NM      .tmp_vmlinux.kallsyms2.syms
    KSYMS   .tmp_vmlinux.kallsyms2.S
    AS      .tmp_vmlinux.kallsyms2.o
    LD      vmlinux

Step 0 takes /dev/null as input, and generates .tmp_vmlinux.kallsyms0.o,
which has a valid kallsyms format with the empty symbol list, and can be
linked to vmlinux. Since it is really small, the added compile-time cost
is negligible.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
15 months agokbuild: refactor variables in scripts/link-vmlinux.sh
Masahiro Yamada [Mon, 10 Jun 2024 11:25:16 +0000 (20:25 +0900)]
kbuild: refactor variables in scripts/link-vmlinux.sh

Clean up the variables in scripts/link-vmlinux.sh

 - Specify the extra objects directly in vmlinux_link()
 - Move the AS rule to kallsyms()
 - Set kallsymso and btf_vmlinux_bin_o where they are generated
 - Remove unneeded variable, kallsymso_prev
 - Introduce the btf_data variable
 - Introduce the strip_debug flag instead of checking the output name

No functional change intended.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
15 months agokconfig: refactor conf_write_defconfig() to reduce indentation level
Masahiro Yamada [Sun, 2 Jun 2024 12:54:16 +0000 (21:54 +0900)]
kconfig: refactor conf_write_defconfig() to reduce indentation level

Reduce the indentation level by continue'ing the loop earlier
if (!sym || sym_is_choice(sym)).

No functional change intended.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
15 months agokconfig: refactor conf_set_all_new_symbols() to reduce indentation level
Masahiro Yamada [Sun, 2 Jun 2024 12:54:15 +0000 (21:54 +0900)]
kconfig: refactor conf_set_all_new_symbols() to reduce indentation level

The outer switch statement can be avoided by continue'ing earlier the
loop when the symbol type is neither S_BOOLEAN nor S_TRISTATE.

Remove it to reduce the indentation level by one. In addition, avoid
the repetition of sym->def[S_DEF_USER].tri.

No functional change intended.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
15 months agokconfig: remove tristate choice support
Masahiro Yamada [Sun, 2 Jun 2024 12:54:14 +0000 (21:54 +0900)]
kconfig: remove tristate choice support

I previously submitted a fix for a bug in the choice feature [1], where
I mentioned, "Another (much cleaner) approach would be to remove the
tristate choice support entirely".

There are more issues in the tristate choice feature. For example, you
can observe a couple of bugs in the following test code.

[Test Code]

    config MODULES
            def_bool y
            modules

    choice
            prompt "tristate choice"
            default A

    config A
            tristate "A"

    config B
            tristate "B"

    endchoice

Bug 1: the 'default' property is not correctly processed

'make alldefconfig' produces:

    CONFIG_MODULES=y
    # CONFIG_A is not set
    # CONFIG_B is not set

However, the correct output should be:

    CONFIG_MODULES=y
    CONFIG_A=y
    # CONFIG_B is not set

The unit test file, scripts/kconfig/tests/choice/alldef_expected_config,
is wrong as well.

Bug 2: choice members never get 'y' with randconfig

For the test code above, the following combinations are possible:

               A    B
        (1)    y    n
        (2)    n    y
        (3)    m    m
        (4)    m    n
        (5)    n    m
        (6)    n    n

'make randconfig' never produces (1) or (2).

These bugs are fixable, but a more critical problem is the lack of a
sensible syntax to specify the default for the tristate choice.
The default for the choice must be one of the choice members, which
cannot specify any of the patterns (3) through (6) above.

In addition, I have never seen it being used in a useful way.

The following commits removed unnecessary use of tristate choices:

 - df8df5e4bc37 ("usb: get rid of 'choice' for legacy gadget drivers")
 - bfb57ef0544a ("rapidio: remove choice for enumeration")

This commit removes the tristate choice support entirely, which allows
me to delete a lot of code, making further refactoring easier.

Note:
This includes the revert of commit fa64e5f6a35e ("kconfig/symbol.c:
handle choice_values that depend on 'm' symbols"). It was suspicious
because it did not address the root cause but introduced inconsistency
in visibility between choice members and other symbols.

[1]: https://lore.kernel.org/linux-kbuild/20240427104231.2728905-1-masahiroy@kernel.org/T/#m0a1bb6992581462ceca861b409bb33cb8fd7dbae

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
15 months agokconfig: pass new conf_changed value to the callback
Masahiro Yamada [Sat, 1 Jun 2024 18:20:43 +0000 (03:20 +0900)]
kconfig: pass new conf_changed value to the callback

Commit ee06a3ef7e3c ("kconfig: Update config changed flag before calling
callback") pointed out that conf_updated flag must be updated _before_
calling the callback, which needs to know the new value.

Given that, it makes sense to directly pass the new value to the
callback.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: gconf: move conf_changed() definition up
Masahiro Yamada [Sat, 1 Jun 2024 18:20:42 +0000 (03:20 +0900)]
kconfig: gconf: move conf_changed() definition up

Define conf_changed() before its call site to remove the forward
declaration.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: gconf: remove unnecessary forward declarations
Masahiro Yamada [Sat, 1 Jun 2024 18:20:41 +0000 (03:20 +0900)]
kconfig: gconf: remove unnecessary forward declarations

These are defined before their call sites.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agokconfig: qconf: remove initial call to conf_changed()
Masahiro Yamada [Sat, 1 Jun 2024 18:20:39 +0000 (03:20 +0900)]
kconfig: qconf: remove initial call to conf_changed()

If any CONFIG option is changed while loading the .config file,
conf_read() calls conf_set_changed(true) and then the conf_changed()
callback.

With conf_read() moved after window initialization, the explicit
conf_changed() call can be removed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agoinitramfs: shorten cmd_initfs in usr/Makefile
Masahiro Yamada [Fri, 31 May 2024 12:28:25 +0000 (21:28 +0900)]
initramfs: shorten cmd_initfs in usr/Makefile

Avoid repetition of long variables.

No functional change intended.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
15 months agoLinux 6.10-rc7
Linus Torvalds [Sun, 7 Jul 2024 21:23:46 +0000 (14:23 -0700)]
Linux 6.10-rc7

15 months agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 7 Jul 2024 17:59:38 +0000 (10:59 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A set of clk fixes for the Qualcomm, Mediatek, and Allwinner drivers:

   - Fix the Qualcomm Stromer Plus PLL set_rate() clk_op to explicitly
     set the alpha enable bit and not set bits that don't exist

   - Mark Qualcomm IPQ9574 crypto clks as voted to avoid stuck clk
     warnings

   - Fix the parent of some PLLs on Qualcomm sm6530 so their rate is
     correct

   - Fix the min/max rate clamping logic in the Allwinner driver that
     got broken in v6.9

   - Limit runtime PM enabling in the Mediatek driver to only
     mt8183-mfgcfg so that system wide resume doesn't break on other
     Mediatek SoCs"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: mediatek: mt8183: Only enable runtime PM on mt8183-mfgcfg
  clk: sunxi-ng: common: Don't call hw_to_ccu_common on hw without common
  clk: qcom: gcc-ipq9574: Add BRANCH_HALT_VOTED flag
  clk: qcom: apss-ipq-pll: remove 'config_ctl_hi_val' from Stromer pll configs
  clk: qcom: clk-alpha-pll: set ALPHA_EN bit for Stromer Plus PLLs
  clk: qcom: gcc-sm6350: Fix gpll6* & gpll7 parents

15 months agoMerge tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 7 Jul 2024 01:31:24 +0000 (18:31 -0700)]
Merge tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix unnecessary copy to 0 when kernel is booted at address 0

 - Fix usercopy crash when dumping dtl via debugfs

 - Avoid possible crash when PCI hotplug races with error handling

 - Fix kexec crash caused by scv being disabled before other CPUs
   call-in

 - Fix powerpc selftests build with USERCFLAGS set

Thanks to Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen,
Nicholas Piggin, Sourabh Jain, Srikar Dronamraju, and Vishal Chourasia.

* tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix build with USERCFLAGS set
  powerpc/pseries: Fix scv instruction crash with kexec
  powerpc/eeh: avoid possible crash when edev->pdev changes
  powerpc/pseries: Whitelist dtl slub object for copying to userspace
  powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0

15 months agoMerge tag '6.10-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 6 Jul 2024 23:16:58 +0000 (16:16 -0700)]
Merge tag '6.10-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
 "Fix for smb3 readahead performance regression"

* tag '6.10-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix read-performance regression by dropping readahead expansion

15 months agoMerge tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 6 Jul 2024 16:51:00 +0000 (09:51 -0700)]
Merge tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:
 "An i2c driver fix"

* tag 'i2c-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr

15 months agoselftests/powerpc: Fix build with USERCFLAGS set
Michael Ellerman [Sat, 6 Jul 2024 12:08:33 +0000 (22:08 +1000)]
selftests/powerpc: Fix build with USERCFLAGS set

Currently building the powerpc selftests with USERCFLAGS set to anything
causes the build to break:

  $ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
  ...
  gcc -Wno-error    cache_shape.c ...
  cache_shape.c:18:10: fatal error: utils.h: No such file or directory
     18 | #include "utils.h"
        |          ^~~~~~~~~
  compilation terminated.

This happens because the USERCFLAGS are added to CFLAGS in lib.mk, which
causes the check of CFLAGS in powerpc/flags.mk to skip setting CFLAGS at
all, resulting in none of the usual CFLAGS being passed. That can
be seen in the output above, the only flag passed to the compiler is
-Wno-error.

Fix it by dropping the conditional setting of CFLAGS in flags.mk.
Instead always set CFLAGS, but also append USERCFLAGS if they are set.

Note that appending to CFLAGS (with +=) wouldn't work, because flags.mk
is included by multiple Makefiles (to support partial builds), causing
CFLAGS to be appended to multiple times. Additionally that would place
the USERCFLAGS prior to the standard CFLAGS, meaning the USERCFLAGS
couldn't override the standard flags. Being able to override the
standard flags is desirable, for example for adding -Wno-error.

With the fix in place, the CFLAGS are set correctly, including the
USERCFLAGS:

  $ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
  ...
  gcc -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v6.10-rc2-7-gdea17e7e56c3"'
  -I/home/michael/linux/tools/testing/selftests/powerpc/include -Wno-error
  cache_shape.c ...

Fixes: 5553a79387e9 ("selftests/powerpc: Add flags.mk to support pmu buildable")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240706120833.909853-1-mpe@ellerman.id.au
15 months agoMerge tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar...
Linus Torvalds [Fri, 5 Jul 2024 23:21:54 +0000 (16:21 -0700)]
Merge tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fix from Mimi Zohar:
 "A single bug fix to properly remove all of the securityfs IMA
  measurement lists"

* tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: fix wrong zero-assignment during securityfs dentry remove

15 months agoMerge tag 'pci-v6.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 5 Jul 2024 19:33:00 +0000 (12:33 -0700)]
Merge tag 'pci-v6.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci update from Bjorn Helgaas:

 - Update MAINTAINERS and CREDITS to credit Gustavo Pimentel with the
   Synopsys DesignWare eDMA driver and reflect that he is no longer at
   Synopsys and isn't in a position to maintain the DesignWare xData
   traffic generator (Bjorn Helgaas)

* tag 'pci-v6.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  CREDITS: Add Synopsys DesignWare eDMA driver for Gustavo Pimentel
  MAINTAINERS: Orphan Synopsys DesignWare xData traffic generator

15 months agoMerge tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 5 Jul 2024 19:22:51 +0000 (12:22 -0700)]
Merge tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix for the CMODX example in the recently added icache flushing
   prctl()

 - A fix to the perf driver to avoid corrupting event data on counter
   overflows when external overflow handlers are in use

 - A fix to clear all hardware performance monitor events on boot, to
   avoid dangling events firmware or previously booted kernels from
   triggering spuriously

 - A fix to the perf event probing logic to avoid erroneously reporting
   the presence of unimplemented counters. This also prevents some
   implemented counters from being reported

 - A build fix for the vector sigreturn selftest on clang

 - A fix to ftrace, which now requires the previously optional index
   argument to ftrace_graph_ret_addr()

 - A fix to avoid deadlocking if kexec crash handling triggers in an
   interrupt context

* tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: kexec: Avoid deadlock in kexec crash path
  riscv: stacktrace: fix usage of ftrace_graph_ret_addr()
  riscv: selftests: Fix vsetivli args for clang
  perf: RISC-V: Check standard event availability
  drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
  drivers/perf: riscv: Do not update the event data if uptodate
  documentation: Fix riscv cmodx example

15 months agoMerge tag 'drm-fixes-2024-07-05' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 5 Jul 2024 18:53:40 +0000 (11:53 -0700)]
Merge tag 'drm-fixes-2024-07-05' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Daniel Vetter:
 "Just small fixes all over here, all quiet as it should.

  drivers:

   - amd: mostly amdgpu display fixes + radeon vm NULL deref fix

   - xe: migration error handling + typoed register name in gt setup

   - i915: usb-c fix to shut up warnings on MTL+

   - panthor: fix sync-only jobs + ioctl validation fix to not EINVAL
     wrongly

   - panel quirks

   - nouveau: NULL deref in get_modes

  drm core:

   - fbdev big endian fix for the dma memory backed variant

  drivers/firmware:

   - fix sysfb refcounting"

* tag 'drm-fixes-2024-07-05' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/mcr: Avoid clobbering DSS steering
  drm/xe: fix error handling in xe_migrate_update_pgtables
  drm/ttm: Always take the bo delayed cleanup path for imported bos
  drm/fbdev-generic: Fix framebuffer on big endian devices
  drm/panthor: Fix sync-only jobs
  drm/panthor: Don't check the array stride on empty uobj arrays
  drm/amdgpu/atomfirmware: silence UBSAN warning
  drm/radeon: check bo_va->bo is non-NULL before using it
  drm/amd/display: Fix array-index-out-of-bounds in dml2/FCLKChangeSupport
  drm/amd/display: Update efficiency bandwidth for dcn351
  drm/amd/display: Fix refresh rate range for some panel
  drm/amd/display: Account for cursor prefetch BW in DML1 mode support
  drm/amd/display: Add refresh rate range check
  drm/amd/display: Reset freesync config before update new state
  drm: panel-orientation-quirks: Add labels for both Valve Steam Deck revisions
  drm: panel-orientation-quirks: Add quirk for Valve Galileo
  drm/i915/display: For MTL+ platforms skip mg dp programming
  drm/nouveau: fix null pointer dereference in nouveau_connector_get_modes
  firmware: sysfb: Fix reference count of sysfb parent device

15 months agoMerge tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 5 Jul 2024 18:39:30 +0000 (11:39 -0700)]
Merge tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "Two OF lookup quirks and one fix for an issue in the generic gpio-mmio
  driver:

   - add two OF lookup quirks for TSC2005 and MIPS Lantiq

   - don't try to figure out bgpio_bits from the 'ngpios' property in
     gpio-mmio"

* tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: of: add polarity quirk for TSC2005
  gpio: mmio: do not calculate bgpio_bits via "ngpios"
  gpiolib: of: fix lookup quirk for MIPS Lantiq

15 months agoMerge tag 'tpmdd-next-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 5 Jul 2024 18:30:57 +0000 (11:30 -0700)]
Merge tag 'tpmdd-next-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull TPM fixes from Jarkko Sakkinen:
 "This contains the fixes for !chip->auth condition, preventing the
  breakage of:
   - tpm_ftpm_tee.c
   - tpm_i2c_nuvoton.c
   - tpm_ibmvtpm.c
   - tpm_tis_i2c_cr50.c
   - tpm_vtpm_proxy.c

  All drivers will continue to work as they did in 6.9, except a single
  warning (dev_warn() not WARN()) is printed to klog only to inform that
  authenticated sessions are not enabled"

* tag 'tpmdd-next-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Address !chip->auth in tpm_buf_append_hmac_session*()
  tpm: Address !chip->auth in tpm_buf_append_name()
  tpm: Address !chip->auth in tpm2_*_auth_session()

15 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 5 Jul 2024 18:23:30 +0000 (11:23 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fix from Paolo Bonzini:

 - s390: fix support for z16 systems

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: fix LPSWEY handling

15 months agoMerge tag 'i2c-host-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Wolfram Sang [Fri, 5 Jul 2024 14:08:55 +0000 (16:08 +0200)]
Merge tag 'i2c-host-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

This tag includes a nice fix in the PNX driver that has been
pending for a long time. Piotr has replaced a potential lock in
the interrupt context with a more efficient and straightforward
handling of the timeout signaling.

15 months agoMerge tag 'amd-drm-fixes-6.10-2024-07-03' of https://gitlab.freedesktop.org/agd5f...
Daniel Vetter [Fri, 5 Jul 2024 10:54:13 +0000 (12:54 +0200)]
Merge tag 'amd-drm-fixes-6.10-2024-07-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.10-2024-07-03:

amdgpu:
- Freesync fixes
- DML1 bandwidth fix
- DCN 3.5 fixes
- DML2 fix
- Silence an UBSAN warning

radeon:
- GPUVM fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703184723.1981997-1-alexander.deucher@amd.com
15 months agogpiolib: of: add polarity quirk for TSC2005
Dmitry Torokhov [Wed, 3 Jul 2024 18:26:09 +0000 (11:26 -0700)]
gpiolib: of: add polarity quirk for TSC2005

DTS for Nokia N900 incorrectly specifies "active high" polarity for
the reset line, while the chip documentation actually specifies it as
"active low".  In the past the driver fudged gpiod API and inverted
the logic internally, but it was changed in d0d89493bff8.

Fixes: d0d89493bff8 ("Input: tsc2004/5 - switch to using generic device properties")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/ZoWXwYtwgJIxi-hD@google.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
15 months agoMerge tag 'kvm-s390-master-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Fri, 5 Jul 2024 08:45:53 +0000 (04:45 -0400)]
Merge tag 'kvm-s390-master-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fix z16 support

The z16 support might fail with the lpswey instruction. Provide a
handler.

15 months agotpm: Address !chip->auth in tpm_buf_append_hmac_session*()
Jarkko Sakkinen [Wed, 3 Jul 2024 15:47:46 +0000 (18:47 +0300)]
tpm: Address !chip->auth in tpm_buf_append_hmac_session*()

Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can
cause a null derefence in tpm_buf_hmac_session*().  Thus, address
!chip->auth in tpm_buf_hmac_session*() and remove the fallback
implementation for !TCG_TPM2_HMAC.

Cc: stable@vger.kernel.org # v6.9+
Reported-by: Stefan Berger <stefanb@linux.ibm.com>
Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/
Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API")
Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
15 months agotpm: Address !chip->auth in tpm_buf_append_name()
Jarkko Sakkinen [Wed, 3 Jul 2024 15:33:14 +0000 (18:33 +0300)]
tpm: Address !chip->auth in tpm_buf_append_name()

Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can
cause a null derefence in tpm_buf_append_name().  Thus, address
!chip->auth in tpm_buf_append_name() and remove the fallback
implementation for !TCG_TPM2_HMAC.

Cc: stable@vger.kernel.org # v6.10+
Reported-by: Stefan Berger <stefanb@linux.ibm.com>
Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/
Fixes: d0a25bb961e6 ("tpm: Add HMAC session name/handle append")
Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
15 months agotpm: Address !chip->auth in tpm2_*_auth_session()
Jarkko Sakkinen [Wed, 3 Jul 2024 16:39:27 +0000 (19:39 +0300)]
tpm: Address !chip->auth in tpm2_*_auth_session()

Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause
a null derefence in tpm2_*_auth_session(). Thus, address !chip->auth in
tpm2_*_auth_session().

Cc: stable@vger.kernel.org # v6.9+
Reported-by: Stefan Berger <stefanb@linux.ibm.com>
Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/
Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions")
Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
15 months agoMerge tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 4 Jul 2024 17:27:37 +0000 (10:27 -0700)]
Merge tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix folio refcounting when releasing them (encoded write, dummy
   extent buffer)

 - fix out of bounds read when checking qgroup inherit data

 - fix how configurable chunk size is handled in zoned mode

 - in the ref-verify tool, fix uninitialized return value when checking
   extent owner ref and simple quota are not enabled

* tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix folio refcount in __alloc_dummy_extent_buffer()
  btrfs: fix folio refcount in btrfs_do_encoded_write()
  btrfs: fix uninitialized return value in the ref-verify tool
  btrfs: always do the basic checks for btrfs_qgroup_inherit structure
  btrfs: zoned: fix calc_available_free_space() for zoned mode

15 months agoMerge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 4 Jul 2024 17:11:12 +0000 (10:11 -0700)]
Merge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, wireless and netfilter.

  There's one fix for power management with Intel's e1000e here,
  Thorsten tells us there's another problem that started in v6.9. We're
  trying to wrap that up but I don't think it's blocking.

  Current release - new code bugs:

   - wifi: mac80211: disable softirqs for queued frame handling

   - af_unix: fix uninit-value in __unix_walk_scc(), with the new
     garbage collection algo

  Previous releases - regressions:

   - Bluetooth:
      - qca: fix BT enable failure for QCA6390 after warm reboot
      - add quirk to ignore reserved PHY bits in LE Extended Adv Report,
        abused by some Broadcom controllers found on Apple machines

   - wifi: wilc1000: fix ies_len type in connect path

  Previous releases - always broken:

   - tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
     avoid premature timeouts

   - net: make sure skb_datagram_iter maps fragments page by page, in
     case we somehow get compound highmem mixed in

   - eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more
     queues are used

  Misc:

   - MAINTAINERS: Remembering Larry Finger"

* tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  bnxt_en: Fix the resource check condition for RSS contexts
  mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
  inet_diag: Initialize pad field in struct inet_diag_req_v2
  tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
  selftests: make order checking verbose in msg_zerocopy selftest
  selftests: fix OOM in msg_zerocopy selftest
  ice: use proper macro for testing bit
  ice: Reject pin requests with unsupported flags
  ice: Don't process extts if PTP is disabled
  ice: Fix improper extts handling
  selftest: af_unix: Add test case for backtrack after finalising SCC.
  af_unix: Fix uninit-value in __unix_walk_scc()
  bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
  net: rswitch: Avoid use-after-free in rswitch_poll()
  netfilter: nf_tables: unconditionally flush pending work before notifier
  wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
  wifi: iwlwifi: mvm: avoid link lookup in statistics
  wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
  wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
  wifi: wilc1000: fix ies_len type in connect path
  ...

15 months agoMerge tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 4 Jul 2024 16:46:15 +0000 (09:46 -0700)]
Merge tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Fix and add physical to virtual address translations in dasd and
   virtio_ccw drivers. For virtio_ccw this is just a minimal fix.
   More code cleanup will follow.

 - Small defconfig updates

* tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/dasd: Fix invalid dereferencing of indirect CCW data pointer
  s390/vfio_ccw: Fix target addresses of TIC CCWs
  s390: Update defconfigs

15 months agoMerge tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 4 Jul 2024 16:36:42 +0000 (09:36 -0700)]
Merge tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fix from Hans de Goede:

 - Fix regression in toshiba_acpi introduced in 6.10-rc1

* tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: toshiba_acpi: Fix quickstart quirk handling

15 months agoMerge tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 4 Jul 2024 16:29:42 +0000 (09:29 -0700)]
Merge tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull Kselftest fix from Mickaël Salaün:
 "Fix Kselftests timeout.

  We can't use CLONE_VFORK, since that blocks the parent - and thus the
  timeout handling - until the child exits or execve's.

  Go back to using plain fork()"

* tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/harness: Fix tests timeout and race condition

15 months agoMerge tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 4 Jul 2024 16:13:02 +0000 (09:13 -0700)]
Merge tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from, Andrew Morton:
 "6 hotfies, all cc:stable. Some fixes for longstanding nilfs2 issues
  and three unrelated MM fixes"

* tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: fix incorrect inode allocation from reserved inodes
  nilfs2: add missing check for inode numbers on directory entries
  nilfs2: fix inode number range checks
  mm: avoid overflows in dirty throttling logic
  Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
  mm: optimize the redundant loop of mm_update_owner_next()

15 months agoMerge tag 'drm-misc-fixes-2024-07-04' of https://gitlab.freedesktop.org/drm/misc...
Daniel Vetter [Thu, 4 Jul 2024 14:48:02 +0000 (16:48 +0200)]
Merge tag 'drm-misc-fixes-2024-07-04' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.10-rc7:
- Add panel quirks.
- Firmware sysfb refcount fix.
- Another null pointer mode deref fix for nouveau.
- Panthor sync and uobj fixes.
- Fix fbdev regression since v6.7.
- Delay free imported bo in ttm to fix lockdep splat.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ffba0c63-2798-40b6-948d-361cd3b14e9f@linux.intel.com
15 months agoMerge tag 'drm-xe-fixes-2024-07-04' of https://gitlab.freedesktop.org/drm/xe/kernel...
Daniel Vetter [Thu, 4 Jul 2024 14:44:16 +0000 (16:44 +0200)]
Merge tag 'drm-xe-fixes-2024-07-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- One copy/paste mistake fix.
- One error path fix causing an error pointer dereference.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZoZ-wD66lgjiNh72@fedora
15 months agobnxt_en: Fix the resource check condition for RSS contexts
Pavan Chebbi [Wed, 3 Jul 2024 18:01:12 +0000 (11:01 -0700)]
bnxt_en: Fix the resource check condition for RSS contexts

While creating a new RSS context, bnxt_rfs_capable() currently
makes a strict check to see if the required VNICs are already
available.  If the current VNICs are not what is required,
either too many or not enough, it will call the firmware to
reserve the exact number required.

There is a bug in the firmware when the driver tries to
relinquish some reserved VNICs and RSS contexts.  It will
cause the default VNIC to lose its RSS configuration and
cause receive packets to be placed incorrectly.

Workaround this problem by skipping the resource reduction.
The driver will not reduce the VNIC and RSS context reservations
when a context is deleted.  The resources will be available for
use when new contexts are created later.

Potentially, this workaround can cause us to run out of VNIC
and RSS contexts if there are a lot of VF functions creating
and deleting RSS contexts.  In the future, we will conditionally
disable this workaround when the firmware fix is available.

Fixes: 438ba39b25fe ("bnxt_en: Improve RSS context reservation infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agomlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
Aleksandr Mishin [Wed, 3 Jul 2024 20:32:51 +0000 (23:32 +0300)]
mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file

In case of invalid INI file mlxsw_linecard_types_init() deallocates memory
but doesn't reset pointer to NULL and returns 0. In case of any error
occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init()
calls mlxsw_linecard_types_fini() which performs memory deallocation again.

Add pointer reset to NULL.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20240703203251.8871-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoMerge tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 4 Jul 2024 14:31:54 +0000 (07:31 -0700)]
Merge tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.10

Hopefully the last fixes for v6.10. Fix a regression in wilc1000
where bitrate Information Elements longer than 255 bytes were broken.
Few fixes also to mac80211 and iwlwifi.

* tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
  wifi: iwlwifi: mvm: avoid link lookup in statistics
  wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
  wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
  wifi: wilc1000: fix ies_len type in connect path
  wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP
====================

Link: https://patch.msgid.link/20240704111431.11DEDC3277B@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoMerge tag 'drm-intel-fixes-2024-07-02' of https://gitlab.freedesktop.org/drm/i915...
Daniel Vetter [Thu, 4 Jul 2024 14:14:17 +0000 (16:14 +0200)]
Merge tag 'drm-intel-fixes-2024-07-02' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

drm/i915 fixes for v6.10-rc7:
- Skip unnecessary MG programming, avoiding warnings (Imre)

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87frss9ozs.fsf@intel.com
15 months agoMerge tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 4 Jul 2024 13:31:26 +0000 (15:31 +0200)]
Merge tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains a oneliner patch to inconditionally flush
workqueue containing stale objects to be released, syzbot managed to
trigger UaF. Patch from Florian Westphal.

netfilter pull request 24-07-04

* tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: unconditionally flush pending work before notifier
====================

Link: https://patch.msgid.link/20240703223304.1455-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
15 months agoinet_diag: Initialize pad field in struct inet_diag_req_v2
Shigeru Yoshida [Wed, 3 Jul 2024 09:16:49 +0000 (18:16 +0900)]
inet_diag: Initialize pad field in struct inet_diag_req_v2

KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw
sockets uses the pad field in struct inet_diag_req_v2 for the
underlying protocol. This field corresponds to the sdiag_raw_protocol
field in struct inet_diag_req_raw.

inet_diag_get_exact_compat() converts inet_diag_req to
inet_diag_req_v2, but leaves the pad field uninitialized. So the issue
occurs when raw_lookup() accesses the sdiag_raw_protocol field.

Fix this by initializing the pad field in
inet_diag_get_exact_compat(). Also, do the same fix in
inet_diag_dump_compat() to avoid the similar issue in the future.

[1]
BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline]
BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
 raw_lookup net/ipv4/raw_diag.c:49 [inline]
 raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
 inet_diag_cmd_exact+0x7d9/0x980
 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
 inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x332/0x3d0 net/socket.c:745
 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
 __sys_sendmsg net/socket.c:2668 [inline]
 __do_sys_sendmsg net/socket.c:2677 [inline]
 __se_sys_sendmsg net/socket.c:2675 [inline]
 __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was stored to memory at:
 raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71
 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
 inet_diag_cmd_exact+0x7d9/0x980
 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
 inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x332/0x3d0 net/socket.c:745
 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
 __sys_sendmsg net/socket.c:2668 [inline]
 __do_sys_sendmsg net/socket.c:2677 [inline]
 __se_sys_sendmsg net/socket.c:2675 [inline]
 __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Local variable req.i created at:
 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline]
 inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426
 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282

CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014

Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240703091649.111773-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
15 months agotcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
Kuniyuki Iwashima [Wed, 3 Jul 2024 03:35:08 +0000 (20:35 -0700)]
tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.

When we process segments with TCP AO, we don't check it in
tcp_parse_options().  Thus, opt_rx->saw_unknown is set to 1,
which unconditionally triggers the BPF TCP option parser.

Let's avoid the unnecessary BPF invocation.

Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240703033508.6321-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
15 months agodrm/xe/mcr: Avoid clobbering DSS steering
Matt Roper [Wed, 26 Jun 2024 21:05:37 +0000 (14:05 -0700)]
drm/xe/mcr: Avoid clobbering DSS steering

A couple copy/paste mistakes in the code that selects steering targets
for OADDRM and INSTANCE0 unintentionally clobbered the steering target
for DSS ranges in some cases.

The OADDRM/INSTANCE0 values were also not assigned as intended, although
that mistake wound up being harmless since the desired values for those
specific ranges were '0' which the kzalloc of the GT structure should
have already taken care of implicitly.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240626210536.1620176-2-matthew.d.roper@intel.com
(cherry picked from commit 4f82ac6102788112e599a6074d2c1f2afce923df)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
15 months agodrm/xe: fix error handling in xe_migrate_update_pgtables
Matthew Auld [Thu, 20 Jun 2024 10:20:26 +0000 (11:20 +0100)]
drm/xe: fix error handling in xe_migrate_update_pgtables

Don't call drm_suballoc_free with sa_bo pointing to PTR_ERR.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2120
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240620102025.127699-2-matthew.auld@intel.com
(cherry picked from commit ce6b63336f79ec5f3996de65f452330e395f99ae)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
15 months agodrm/ttm: Always take the bo delayed cleanup path for imported bos
Thomas Hellström [Fri, 28 Jun 2024 15:38:48 +0000 (17:38 +0200)]
drm/ttm: Always take the bo delayed cleanup path for imported bos

Bos can be put with multiple unrelated dma-resv locks held. But
imported bos attempt to grab the bo dma-resv during dma-buf detach
that typically happens during cleanup. That leads to lockde splats
similar to the below and a potential ABBA deadlock.

Fix this by always taking the delayed workqueue cleanup path for
imported bos.

Requesting stable fixes from when the Xe driver was introduced,
since its usage of drm_exec and wide vm dma_resvs appear to be
the first reliable trigger of this.

[22982.116427] ============================================
[22982.116428] WARNING: possible recursive locking detected
[22982.116429] 6.10.0-rc2+ #10 Tainted: G     U  W
[22982.116430] --------------------------------------------
[22982.116430] glxgears:sh0/5785 is trying to acquire lock:
[22982.116431] ffff8c2bafa539a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: dma_buf_detach+0x3b/0xf0
[22982.116438]
               but task is already holding lock:
[22982.116438] ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116442]
               other info that might help us debug this:
[22982.116442]  Possible unsafe locking scenario:

[22982.116443]        CPU0
[22982.116444]        ----
[22982.116444]   lock(reservation_ww_class_mutex);
[22982.116445]   lock(reservation_ww_class_mutex);
[22982.116447]
                *** DEADLOCK ***

[22982.116447]  May be due to missing lock nesting notation

[22982.116448] 5 locks held by glxgears:sh0/5785:
[22982.116449]  #0: ffff8c2d9aba58c8 (&xef->vm.lock){+.+.}-{3:3}, at: xe_file_close+0xde/0x1c0 [xe]
[22982.116507]  #1: ffff8c2e28cc8480 (&vm->lock){++++}-{3:3}, at: xe_vm_close_and_put+0x161/0x9b0 [xe]
[22982.116578]  #2: ffff8c2e31982970 (&val->lock){.+.+}-{3:3}, at: xe_validation_ctx_init+0x6d/0x70 [xe]
[22982.116647]  #3: ffffacdc469478a8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: xe_vma_destroy_unlocked+0x7f/0xe0 [xe]
[22982.116716]  #4: ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116719]
               stack backtrace:
[22982.116720] CPU: 8 PID: 5785 Comm: glxgears:sh0 Tainted: G     U  W          6.10.0-rc2+ #10
[22982.116721] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[22982.116723] Call Trace:
[22982.116724]  <TASK>
[22982.116725]  dump_stack_lvl+0x77/0xb0
[22982.116727]  __lock_acquire+0x1232/0x2160
[22982.116730]  lock_acquire+0xcb/0x2d0
[22982.116732]  ? dma_buf_detach+0x3b/0xf0
[22982.116734]  ? __lock_acquire+0x417/0x2160
[22982.116736]  __ww_mutex_lock.constprop.0+0xd0/0x13b0
[22982.116738]  ? dma_buf_detach+0x3b/0xf0
[22982.116741]  ? dma_buf_detach+0x3b/0xf0
[22982.116743]  ? ww_mutex_lock+0x2b/0x90
[22982.116745]  ww_mutex_lock+0x2b/0x90
[22982.116747]  dma_buf_detach+0x3b/0xf0
[22982.116749]  drm_prime_gem_destroy+0x2f/0x40 [drm]
[22982.116775]  xe_ttm_bo_destroy+0x32/0x220 [xe]
[22982.116818]  ? __mutex_unlock_slowpath+0x3a/0x290
[22982.116821]  drm_exec_unlock_all+0xa1/0xd0 [drm_exec]
[22982.116823]  drm_exec_fini+0x12/0xb0 [drm_exec]
[22982.116824]  xe_validation_ctx_fini+0x15/0x40 [xe]
[22982.116892]  xe_vma_destroy_unlocked+0xb1/0xe0 [xe]
[22982.116959]  xe_vm_close_and_put+0x41a/0x9b0 [xe]
[22982.117025]  ? xa_find+0xe3/0x1e0
[22982.117028]  xe_file_close+0x10a/0x1c0 [xe]
[22982.117074]  drm_file_free+0x22a/0x280 [drm]
[22982.117099]  drm_release_noglobal+0x22/0x70 [drm]
[22982.117119]  __fput+0xf1/0x2d0
[22982.117122]  task_work_run+0x59/0x90
[22982.117125]  do_exit+0x330/0xb40
[22982.117127]  do_group_exit+0x36/0xa0
[22982.117129]  get_signal+0xbd2/0xbe0
[22982.117131]  arch_do_signal_or_restart+0x3e/0x240
[22982.117134]  syscall_exit_to_user_mode+0x1e7/0x290
[22982.117137]  do_syscall_64+0xa1/0x180
[22982.117139]  ? lock_acquire+0xcb/0x2d0
[22982.117140]  ? __set_task_comm+0x28/0x1e0
[22982.117141]  ? find_held_lock+0x2b/0x80
[22982.117144]  ? __set_task_comm+0xe1/0x1e0
[22982.117145]  ? lock_release+0xca/0x290
[22982.117147]  ? __do_sys_prctl+0x245/0xab0
[22982.117149]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[22982.117150]  ? syscall_exit_to_user_mode+0xb0/0x290
[22982.117152]  ? do_syscall_64+0xa1/0x180
[22982.117154]  ? __lock_acquire+0x417/0x2160
[22982.117155]  ? reacquire_held_locks+0xd1/0x1f0
[22982.117156]  ? do_user_addr_fault+0x30c/0x790
[22982.117158]  ? lock_acquire+0xcb/0x2d0
[22982.117160]  ? find_held_lock+0x2b/0x80
[22982.117162]  ? do_user_addr_fault+0x357/0x790
[22982.117163]  ? lock_release+0xca/0x290
[22982.117164]  ? do_user_addr_fault+0x361/0x790
[22982.117166]  ? trace_hardirqs_off+0x4b/0xc0
[22982.117168]  ? clear_bhb_loop+0x45/0xa0
[22982.117170]  ? clear_bhb_loop+0x45/0xa0
[22982.117172]  ? clear_bhb_loop+0x45/0xa0
[22982.117174]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[22982.117176] RIP: 0033:0x7f943d267169
[22982.117192] Code: Unable to access opcode bytes at 0x7f943d26713f.
[22982.117193] RSP: 002b:00007f9430bffc80 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[22982.117195] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f943d267169
[22982.117196] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00005622f89579d0
[22982.117197] RBP: 00007f9430bffcb0 R08: 0000000000000000 R09: 00000000ffffffff
[22982.117198] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[22982.117199] R13: 0000000000000000 R14: 0000000000000000 R15: 00005622f89579d0
[22982.117202]  </TASK>

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240628153848.4989-1-thomas.hellstrom@linux.intel.com
15 months agoMerge branch 'fix-oom-and-order-check-in-msg_zerocopy-selftest'
Jakub Kicinski [Thu, 4 Jul 2024 02:42:33 +0000 (19:42 -0700)]
Merge branch 'fix-oom-and-order-check-in-msg_zerocopy-selftest'

Zijian Zhang says:

====================
fix OOM and order check in msg_zerocopy selftest

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.

Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.

And, we find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
====================

Link: https://patch.msgid.link/20240701225349.3395580-1-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoselftests: make order checking verbose in msg_zerocopy selftest
Zijian Zhang [Mon, 1 Jul 2024 22:53:49 +0000 (22:53 +0000)]
selftests: make order checking verbose in msg_zerocopy selftest

We find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.

Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoselftests: fix OOM in msg_zerocopy selftest
Zijian Zhang [Mon, 1 Jul 2024 22:53:48 +0000 (22:53 +0000)]
selftests: fix OOM in msg_zerocopy selftest

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.

Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.

Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoMerge branch 'intel-wired-lan-driver-updates-2024-06-25-ice'
Jakub Kicinski [Thu, 4 Jul 2024 02:36:53 +0000 (19:36 -0700)]
Merge branch 'intel-wired-lan-driver-updates-2024-06-25-ice'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-06-25 (ice)

This series contains updates to ice driver only.

Milena adds disabling of extts events when PTP is disabled.

Jake prevents possible NULL pointer by checking that timestamps are
ready before processing extts events and adds checks for unsupported
PTP pin configuration.

Petr Oros replaces _test_bit() with the correct test_bit() macro.
v1: https://lore.kernel.org/netdev/20240625170248.199162-1-anthony.l.nguyen@intel.com/
====================

Link: https://patch.msgid.link/20240702171459.2606611-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoice: use proper macro for testing bit
Petr Oros [Tue, 2 Jul 2024 17:14:57 +0000 (10:14 -0700)]
ice: use proper macro for testing bit

Do not use _test_bit() macro for testing bit. The proper macro for this
is one without underline.

_test_bit() is what test_bit() was prior to const-optimization. It
directly calls arch_test_bit(), i.e. the arch-specific implementation
(or the generic one). It's strictly _internal_ and shouldn't be used
anywhere outside the actual test_bit() macro.

test_bit() is a wrapper which checks whether the bitmap and the bit
number are compile-time constants and if so, it calls the optimized
function which evaluates this call to a compile-time constant as well.
If either of them is not a compile-time constant, it just calls _test_bit().
test_bit() is the actual function to use anywhere in the kernel.

IOW, calling _test_bit() avoids potential compile-time optimizations.

The sensors is not a compile-time constant, thus most probably there
are no object code changes before and after the patch.
But anyway, we shouldn't call internal wrappers instead of
the actual API.

Fixes: 4da71a77fc3b ("ice: read internal temperature sensor")
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoice: Reject pin requests with unsupported flags
Jacob Keller [Tue, 2 Jul 2024 17:14:56 +0000 (10:14 -0700)]
ice: Reject pin requests with unsupported flags

The driver receives requests for configuring pins via the .enable
callback of the PTP clock object. These requests come into the driver
with flags which modify the requested behavior from userspace. Current
implementation in ice does not reject flags that it doesn't support.
This causes the driver to incorrectly apply requests with such flags as
PTP_PEROUT_DUTY_CYCLE, or any future flags added by the kernel which it
is not yet aware of.

Fix this by properly validating flags in both ice_ptp_cfg_perout and
ice_ptp_cfg_extts. Ensure that we check by bit-wise negating supported
flags rather than just checking and rejecting known un-supported flags.
This is preferable, as it ensures better compatibility with future
kernels.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoice: Don't process extts if PTP is disabled
Jacob Keller [Tue, 2 Jul 2024 17:14:55 +0000 (10:14 -0700)]
ice: Don't process extts if PTP is disabled

The ice_ptp_extts_event() function can race with ice_ptp_release() and
result in a NULL pointer dereference which leads to a kernel panic.

Panic occurs because the ice_ptp_extts_event() function calls
ptp_clock_event() with a NULL pointer. The ice driver has already
released the PTP clock by the time the interrupt for the next external
timestamp event occurs.

To fix this, modify the ice_ptp_extts_event() function to check the
PTP state and bail early if PTP is not ready.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoice: Fix improper extts handling
Milena Olech [Tue, 2 Jul 2024 17:14:54 +0000 (10:14 -0700)]
ice: Fix improper extts handling

Extts events are disabled and enabled by the application ts2phc.
However, in case where the driver is removed when the application is
running, a specific extts event remains enabled and can cause a kernel
crash.
As a side effect, when the driver is reloaded and application is started
again, remaining extts event for the channel from a previous run will
keep firing and the message "extts on unexpected channel" might be
printed to the user.

To avoid that, extts events shall be disabled when PTP is released.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoselftest: af_unix: Add test case for backtrack after finalising SCC.
Kuniyuki Iwashima [Tue, 2 Jul 2024 16:04:28 +0000 (01:04 +0900)]
selftest: af_unix: Add test case for backtrack after finalising SCC.

syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking
edge_stack after finalising SCC.

Let's add a test case exercising the path.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://patch.msgid.link/20240702160428.10153-2-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoaf_unix: Fix uninit-value in __unix_walk_scc()
Shigeru Yoshida [Tue, 2 Jul 2024 16:04:27 +0000 (01:04 +0900)]
af_unix: Fix uninit-value in __unix_walk_scc()

KMSAN reported uninit-value access in __unix_walk_scc() [1].

In the list_for_each_entry_reverse() loop, when the vertex's index
equals it's scc_index, the loop uses the variable vertex as a
temporary variable that points to a vertex in scc. And when the loop
is finished, the variable vertex points to the list head, in this case
scc, which is a local variable on the stack (more precisely, it's not
even scc and might underflow the call stack of __unix_walk_scc():
container_of(&scc, struct unix_vertex, scc_entry)).

However, the variable vertex is used under the label prev_vertex. So
if the edge_stack is not empty and the function jumps to the
prev_vertex label, the function will access invalid data on the
stack. This causes the uninit-value access issue.

Fix this by introducing a new temporary variable for the loop.

[1]
BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline]
BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline]
BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
 __unix_walk_scc net/unix/garbage.c:478 [inline]
 unix_walk_scc net/unix/garbage.c:526 [inline]
 __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
 process_one_work kernel/workqueue.c:3231 [inline]
 process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
 kthread+0x3c4/0x530 kernel/kthread.c:389
 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Uninit was stored to memory at:
 unix_walk_scc net/unix/garbage.c:526 [inline]
 __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584
 process_one_work kernel/workqueue.c:3231 [inline]
 process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
 kthread+0x3c4/0x530 kernel/kthread.c:389
 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Local variable entries created at:
 ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222
 netdev_tracker_free include/linux/netdevice.h:4058 [inline]
 netdev_put include/linux/netdevice.h:4075 [inline]
 dev_put include/linux/netdevice.h:4101 [inline]
 update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813

CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: events_unbound __unix_gc

Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agobonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
Sam Sun [Tue, 2 Jul 2024 13:55:55 +0000 (14:55 +0100)]
bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()

In function bond_option_arp_ip_targets_set(), if newval->string is an
empty string, newval->string+1 will point to the byte after the
string, causing an out-of-bound read.

BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418
Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107
CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:364 [inline]
 print_report+0xc1/0x5e0 mm/kasan/report.c:475
 kasan_report+0xbe/0xf0 mm/kasan/report.c:588
 strlen+0x7d/0xa0 lib/string.c:418
 __fortify_strlen include/linux/fortify-string.h:210 [inline]
 in4_pton+0xa3/0x3f0 net/core/utils.c:130
 bond_option_arp_ip_targets_set+0xc2/0x910
drivers/net/bonding/bond_options.c:1201
 __bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767
 __bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792
 bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817
 bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156
 dev_attr_store+0x54/0x80 drivers/base/core.c:2366
 sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
 kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
 call_write_iter include/linux/fs.h:2020 [inline]
 new_sync_write fs/read_write.c:491 [inline]
 vfs_write+0x96a/0xd80 fs/read_write.c:584
 ksys_write+0x122/0x250 fs/read_write.c:637
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
---[ end trace ]---

Fix it by adding a check of string length before using it.

Fixes: f9de11a16594 ("bonding: add ip checks when store ip target")
Signed-off-by: Yue Sun <samsun1006219@gmail.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agonet: rswitch: Avoid use-after-free in rswitch_poll()
Radu Rendec [Tue, 2 Jul 2024 21:08:37 +0000 (17:08 -0400)]
net: rswitch: Avoid use-after-free in rswitch_poll()

The use-after-free is actually in rswitch_tx_free(), which is inlined in
rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the
same pointer, the skb is first freed using dev_kfree_skb_any(), then the
value in skb->len is used to update the interface statistics.

Let's move around the instructions to use skb->len before the skb is
freed.

This bug is trivial to reproduce using KFENCE. It will trigger a splat
every few packets. A simple ARP request or ICMP echo request is enough.

Fixes: 271e015b9153 ("net: rswitch: Add unmap_addrs instead of dma address in each desc")
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20240702210838.2703228-1-rrendec@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agobtrfs: fix folio refcount in __alloc_dummy_extent_buffer()
Boris Burkov [Tue, 2 Jul 2024 14:31:14 +0000 (07:31 -0700)]
btrfs: fix folio refcount in __alloc_dummy_extent_buffer()

Another improper use of __folio_put() in an error path after freshly
allocating pages/folios which returns them with the refcount initialized
to 1. The refactor from __free_pages() -> __folio_put() (instead of
folio_put) removed a refcount decrement found in __free_pages() and
folio_put but absent from __folio_put().

Fixes: 13df3775efca ("btrfs: cleanup metadata page pointer usage")
CC: stable@vger.kernel.org # 6.8+
Tested-by: Ed Tomlinson <edtoml@gmail.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
15 months agobtrfs: fix folio refcount in btrfs_do_encoded_write()
Boris Burkov [Tue, 2 Jul 2024 14:31:13 +0000 (07:31 -0700)]
btrfs: fix folio refcount in btrfs_do_encoded_write()

The conversion to folios switched __free_page() to __folio_put() in the
error path in btrfs_do_encoded_write().

However, this gets the page refcounting wrong. If we do hit that error
path (I reproduced by modifying btrfs_do_encoded_write to pretend to
always fail in a way that jumps to out_folios and running the fstests
case btrfs/281), then we always hit the following BUG freeing the folio:

  BUG: Bad page state in process btrfs  pfn:40ab0b
  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x61be5 pfn:0x40ab0b
   flags: 0x5ffff0000000000(node=0|zone=2|lastcpupid=0x1ffff)
  raw: 05ffff0000000000 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000061be5 0000000000000000 00000001ffffffff 0000000000000000
  page dumped because: nonzero _refcount
  Call Trace:
  <TASK>
  dump_stack_lvl+0x3d/0xe0
  bad_page+0xea/0xf0
  free_unref_page+0x8e1/0x900
  ? __mem_cgroup_uncharge+0x69/0x90
  __folio_put+0xe6/0x190
  btrfs_do_encoded_write+0x445/0x780
  ? current_time+0x25/0xd0
  btrfs_do_write_iter+0x2cc/0x4b0
  btrfs_ioctl_encoded_write+0x2b6/0x340

It turns out __free_page() decreases the page reference count while
__folio_put() does not. Switch __folio_put() to folio_put() which
decreases the folio reference count first.

Fixes: 400b172b8cdc ("btrfs: compression: migrate compression/decompression paths to folios")
Tested-by: Ed Tomlinson <edtoml@gmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
15 months agonetfilter: nf_tables: unconditionally flush pending work before notifier
Florian Westphal [Tue, 2 Jul 2024 14:08:14 +0000 (16:08 +0200)]
netfilter: nf_tables: unconditionally flush pending work before notifier

syzbot reports:

KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
 nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
 nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
 nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597

Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.

We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.

AFAICS this problem is only exposed in nf-next, with
commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.

Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
15 months agoi2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
Piotr Wojtaszczyk [Fri, 28 Jun 2024 15:25:42 +0000 (17:25 +0200)]
i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr

When del_timer_sync() is called in an interrupt context it throws a warning
because of potential deadlock. The timer is used only to exit from
wait_for_completion() after a timeout so replacing the call with
wait_for_completion_timeout() allows to remove the problematic timer and
its related functions altogether.

Fixes: 41561f28e76a ("i2c: New Philips PNX bus driver")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk@timesys.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
15 months agoMerge tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Wed, 3 Jul 2024 21:54:35 +0000 (14:54 -0700)]
Merge tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix ioctl conflict with memmapped ring buffer ioctl

  It was reported that the ioctl() number used to update the ring buffer
  memory mapping conflicted with the TCGETS ioctl causing strace to
  report:

    $ strace -e ioctl stty
    ioctl(0, TCGETS or TRACE_MMAP_IOCTL_GET_READER, {c_iflag=ICRNL|IXON, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0

  Since this ioctl hasn't been in a full release yet, change it from
  "T", 0x1 to "R" 0x20, and also reserve 0x20-0x2F for future ioctl
  commands, as some more are being worked on for the future"

* tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F

15 months agotracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F
Steven Rostedt (Google) [Tue, 2 Jul 2024 19:33:54 +0000 (15:33 -0400)]
tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F

To prevent conflicts with other ioctl numbers to allow strace to have an
idea of what is happening, add the range of ioctls for the trace buffer
mapping from _IO("T", 0x1) to the range of "R" 0x20 - 0x2F.

Link: https://lore.kernel.org/linux-trace-kernel/20240630105322.GA17573@altlinux.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240630213626.GA23566@altlinux.org/
Cc: Jonathan Corbet <corbet@lwn.net>
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Link: https://lore.kernel.org/20240702153354.367861db@rorschach.local.home
Reported-by: "Dmitry V. Levin" <ldv@strace.io>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
15 months agoriscv: kexec: Avoid deadlock in kexec crash path
Song Shuai [Wed, 26 Jun 2024 02:33:16 +0000 (10:33 +0800)]
riscv: kexec: Avoid deadlock in kexec crash path

If the kexec crash code is called in the interrupt context, the
machine_kexec_mask_interrupts() function will trigger a deadlock while
trying to acquire the irqdesc spinlock and then deactivate irqchip in
irq_set_irqchip_state() function.

Unlike arm64, riscv only requires irq_eoi handler to complete EOI and
keeping irq_set_irqchip_state() will only leave this possible deadlock
without any use. So we simply remove it.

Link: https://lore.kernel.org/linux-riscv/20231208111015.173237-1-songshuaishuai@tinylab.org/
Fixes: b17d19a5314a ("riscv: kexec: Fixup irq controller broken in kexec crash path")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Ryo Takakura <takakura@valinux.co.jp>
Link: https://lore.kernel.org/r/20240626023316.539971-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
15 months agoriscv: stacktrace: fix usage of ftrace_graph_ret_addr()
Puranjay Mohan [Tue, 18 Jun 2024 14:58:20 +0000 (14:58 +0000)]
riscv: stacktrace: fix usage of ftrace_graph_ret_addr()

ftrace_graph_ret_addr() takes an `idx` integer pointer that is used to
optimize the stack unwinding. Pass it a valid pointer to utilize the
optimizations that might be available in the future.

The commit is making riscv's usage of ftrace_graph_ret_addr() match
x86_64.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240618145820.62112-1-puranjay@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
15 months agoriscv: selftests: Fix vsetivli args for clang
Charlie Jenkins [Wed, 3 Jul 2024 01:54:48 +0000 (18:54 -0700)]
riscv: selftests: Fix vsetivli args for clang

Clang does not support implicit LMUL in the vset* instruction sequences.
Introduce an explicit LMUL in the vsetivli instruction.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Fixes: 9d5328eeb185 ("riscv: selftests: Add signal handling vector tests")
Link: https://lore.kernel.org/r/20240702-fix_sigreturn_test-v1-1-485f88a80612@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
15 months agoMerge patch series "Assorted fixes in RISC-V PMU driver"
Palmer Dabbelt [Wed, 3 Jul 2024 19:56:26 +0000 (12:56 -0700)]
Merge patch series "Assorted fixes in RISC-V PMU driver"

Atish Patra <atishp@rivosinc.com> says:

This series contains 3 fixes out of which the first one is a new fix
for invalid event data reported in lkml[2]. The last two are v3 of Samuel's
patch[1]. I added the RB/TB/Fixes tag and moved 1 unrelated change
to its own patch. I also changed an error message in kvm vcpu_pmu from
pr_err to pr_debug to avoid redundant failure error messages generated
due to the boot time quering of events implemented in the patch[1]

Here is the original cover letter for the patch[1]

Before this patch:
$ perf list hw

List of pre-defined events (to be used in -e or -M):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]

$ perf stat -ddd true

 Performance counter stats for 'true':

              4.36 msec task-clock                       #    0.744 CPUs utilized
                 1      context-switches                 #  229.325 /sec
                 0      cpu-migrations                   #    0.000 /sec
                38      page-faults                      #    8.714 K/sec
         4,375,694      cycles                           #    1.003 GHz                         (60.64%)
           728,945      instructions                     #    0.17  insn per cycle
            79,199      branches                         #   18.162 M/sec
            17,709      branch-misses                    #   22.36% of all branches
           181,734      L1-dcache-loads                  #   41.676 M/sec
             5,547      L1-dcache-load-misses            #    3.05% of all L1-dcache accesses
     <not counted>      LLC-loads                                                               (0.00%)
     <not counted>      LLC-load-misses                                                         (0.00%)
     <not counted>      L1-icache-loads                                                         (0.00%)
     <not counted>      L1-icache-load-misses                                                   (0.00%)
     <not counted>      dTLB-loads                                                              (0.00%)
     <not counted>      dTLB-load-misses                                                        (0.00%)
     <not counted>      iTLB-loads                                                              (0.00%)
     <not counted>      iTLB-load-misses                                                        (0.00%)
     <not counted>      L1-dcache-prefetches                                                    (0.00%)
     <not counted>      L1-dcache-prefetch-misses                                               (0.00%)

       0.005860375 seconds time elapsed

       0.000000000 seconds user
       0.010383000 seconds sys

After this patch:
$ perf list hw

List of pre-defined events (to be used in -e or -M):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]

$ perf stat -ddd true

 Performance counter stats for 'true':

              5.16 msec task-clock                       #    0.848 CPUs utilized
                 1      context-switches                 #  193.817 /sec
                 0      cpu-migrations                   #    0.000 /sec
                37      page-faults                      #    7.171 K/sec
         5,183,625      cycles                           #    1.005 GHz
           961,696      instructions                     #    0.19  insn per cycle
            85,853      branches                         #   16.640 M/sec
            20,462      branch-misses                    #   23.83% of all branches
           243,545      L1-dcache-loads                  #   47.203 M/sec
             5,974      L1-dcache-load-misses            #    2.45% of all L1-dcache accesses
   <not supported>      LLC-loads
   <not supported>      LLC-load-misses
   <not supported>      L1-icache-loads
   <not supported>      L1-icache-load-misses
   <not supported>      dTLB-loads
            19,619      dTLB-load-misses
   <not supported>      iTLB-loads
             6,831      iTLB-load-misses
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       0.006085625 seconds time elapsed

       0.000000000 seconds user
       0.013022000 seconds sys

[1] https://lore.kernel.org/linux-riscv/20240418014652.1143466-1-samuel.holland@sifive.com/
[2] https://lore.kernel.org/all/CC51D53B-846C-4D81-86FC-FBF969D0A0D6@pku.edu.cn/

* b4-shazam-merge:
  perf: RISC-V: Check standard event availability
  drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
  drivers/perf: riscv: Do not update the event data if uptodate

Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-0-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
15 months agoperf: RISC-V: Check standard event availability
Samuel Holland [Fri, 28 Jun 2024 07:51:43 +0000 (00:51 -0700)]
perf: RISC-V: Check standard event availability

The RISC-V SBI PMU specification defines several standard hardware and
cache events. Currently, all of these events are exposed to userspace,
even when not actually implemented. They appear in the `perf list`
output, and commands like `perf stat` try to use them.

This is more than just a cosmetic issue, because the PMU driver's .add
function fails for these events, which causes pmu_groups_sched_in() to
prematurely stop scheduling in other (possibly valid) hardware events.

Add logic to check which events are supported by the hardware (i.e. can
be mapped to some counter), so only usable events are reported to
userspace. Since the kernel does not know the mapping between events and
possible counters, this check must happen during boot, when no counters
are in use. Make the check asynchronous to minimize impact on boot time.

Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-3-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
15 months agodrivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
Samuel Holland [Fri, 28 Jun 2024 07:51:42 +0000 (00:51 -0700)]
drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus

Currently, we stop all the counters while a new cpu is brought online.
However, the hpmevent to counter mappings are not reset. The firmware may
have some stale encoding in their mapping structure which may lead to
undesirable results. We have not encountered such scenario though.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-2-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
15 months agodrivers/perf: riscv: Do not update the event data if uptodate
Atish Patra [Fri, 28 Jun 2024 07:51:41 +0000 (00:51 -0700)]
drivers/perf: riscv: Do not update the event data if uptodate

In case of an counter overflow, the event data may get corrupted
if called from an external overflow handler. This happens because
we can't update the counter without starting it when SBI PMU
extension is in use. However, the prev_count has been already
updated at the first pass while the counter value is still the
old one.

The solution is simple where we don't need to update it again
if it is already updated which can be detected using hwc state.
The event state in the overflow handler is updated in the following
patch. Thus, this fix can't be backported to kernel version where
overflow support was added.

Fixes: a8625217a054 ("drivers/perf: riscv: Implement SBI PMU snapshot function")
Closes:https://lore.kernel.org/all/CC51D53B-846C-4D81-86FC-FBF969D0A0D6@pku.edu.cn/

Reported-by: garthlei@pku.edu.cn
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-1-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
15 months agonilfs2: fix incorrect inode allocation from reserved inodes
Ryusuke Konishi [Sun, 23 Jun 2024 05:11:35 +0000 (14:11 +0900)]
nilfs2: fix incorrect inode allocation from reserved inodes

If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.

Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.

If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.

Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.

Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agonilfs2: add missing check for inode numbers on directory entries
Ryusuke Konishi [Sun, 23 Jun 2024 05:11:34 +0000 (14:11 +0900)]
nilfs2: add missing check for inode numbers on directory entries

Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().

As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).

The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.

Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.

Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.

Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack@suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agonilfs2: fix inode number range checks
Ryusuke Konishi [Sun, 23 Jun 2024 05:11:33 +0000 (14:11 +0900)]
nilfs2: fix inode number range checks

Patch series "nilfs2: fix potential issues related to reserved inodes".

This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.

This patch (of 3):

In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.

As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.

In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.

Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.

Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.

Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: avoid overflows in dirty throttling logic
Jan Kara [Fri, 21 Jun 2024 14:42:38 +0000 (16:42 +0200)]
mm: avoid overflows in dirty throttling logic

The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits).  If limits end up being larger, we will hit overflows,
possible divisions by 0 etc.  Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway.  For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits.  For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc.  So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.

This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.

Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agoRevert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
Jan Kara [Fri, 21 Jun 2024 14:42:37 +0000 (16:42 +0200)]
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"

Patch series "mm: Avoid possible overflows in dirty throttling".

Dirty throttling logic assumes dirty limits in page units fit into
32-bits.  This patch series makes sure this is true (see patch 2/2 for
more details).

This patch (of 2):

This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.

The commit is broken in several ways.  Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this).  Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs.  We have
div64_ul() in case we want to be safe & cheap.  Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.

Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
15 months agomm: optimize the redundant loop of mm_update_owner_next()
Jinliang Zheng [Thu, 20 Jun 2024 12:21:24 +0000 (20:21 +0800)]
mm: optimize the redundant loop of mm_update_owner_next()

When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.

If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.

Recognize this situation in advance and exit early.

Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tycho Andersen <tandersen@netflix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>