]> www.infradead.org Git - users/willy/xarray.git/log
users/willy/xarray.git
20 months agoDocumentation: qat: fix auto_reset section
Giovanni Cabiddu [Mon, 12 Feb 2024 13:05:09 +0000 (13:05 +0000)]
Documentation: qat: fix auto_reset section

Remove unneeded colon in the auto_reset section.

This resolves the following errors when building the documentation:

    Documentation/ABI/testing/sysfs-driver-qat:146: ERROR: Unexpected indentation.
    Documentation/ABI/testing/sysfs-driver-qat:146: WARNING: Block quote ends without a blank line; unexpected unindent.

Fixes: f5419a4239af ("crypto: qat - add auto reset on error")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-kernel/20240212144830.70495d07@canb.auug.org.au/T/
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - resolve race condition during AER recovery
Damian Muszynski [Fri, 9 Feb 2024 12:43:42 +0000 (13:43 +0100)]
crypto: qat - resolve race condition during AER recovery

During the PCI AER system's error recovery process, the kernel driver
may encounter a race condition with freeing the reset_data structure's
memory. If the device restart will take more than 10 seconds the function
scheduling that restart will exit due to a timeout, and the reset_data
structure will be freed. However, this data structure is used for
completion notification after the restart is completed, which leads
to a UAF bug.

This results in a KFENCE bug notice.

  BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat]
  Use-after-free read at 0x00000000bc56fddf (in kfence-#142):
  adf_device_reset_worker+0x38/0xa0 [intel_qat]
  process_one_work+0x173/0x340

To resolve this race condition, the memory associated to the container
of the work_struct is freed on the worker if the timeout expired,
otherwise on the function that schedules the worker.
The timeout detection can be done by checking if the caller is
still waiting for completion or not by using completion_done() function.

Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - change SLAs cleanup flow at shutdown
Damian Muszynski [Fri, 9 Feb 2024 12:42:07 +0000 (13:42 +0100)]
crypto: qat - change SLAs cleanup flow at shutdown

The implementation of the Rate Limiting (RL) feature includes the cleanup
of all SLAs during device shutdown. For each SLA, the firmware is notified
of the removal through an admin message, the data structures that take
into account the budgets are updated and the memory is freed.
However, this explicit cleanup is not necessary as (1) the device is
reset, and the firmware state is lost and (2) all RL data structures
are freed anyway.

In addition, if the device is unresponsive, for example after a PCI
AER error is detected, the admin interface might not be available.
This might slow down the shutdown sequence and cause a timeout in
the recovery flows which in turn makes the driver believe that the
device is not recoverable.

Fix by replacing the explicit SLAs removal with just a free of the
SLA data structures.

Fixes: d9fb8408376e ("crypto: qat - add rate limiting feature to qat_4xxx")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agoMAINTAINERS: adjust file entries after crypto vmx file movement
Lukas Bulwahn [Thu, 8 Feb 2024 09:33:27 +0000 (10:33 +0100)]
MAINTAINERS: adjust file entries after crypto vmx file movement

Commit 109303336a0c ("crypto: vmx - Move to arch/powerpc/crypto") moves the
crypto vmx files to arch/powerpc, but misses to adjust the file entries for
IBM Power VMX Cryptographic instructions and LINUX FOR POWERPC.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about
broken references.

Adjust these file entries accordingly. To keep the matched files exact
after the movement, spell out each file name in the new directory.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/qm - change function type to void
Weili Qian [Wed, 7 Feb 2024 09:51:01 +0000 (17:51 +0800)]
crypto: hisilicon/qm - change function type to void

The function qm_stop_qp_nolock() always return zero, so
function type is changed to void.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/qm - obtain stop queue status
Weili Qian [Wed, 7 Feb 2024 09:51:00 +0000 (17:51 +0800)]
crypto: hisilicon/qm - obtain stop queue status

The debugfs files 'dev_state' and 'dev_timeout' are added.
Users can query the current queue stop status through these two
files. And set the waiting timeout when the queue is released.

dev_state: if dev_timeout is set, dev_state indicates the status
of stopping the queue. 0 indicates that the queue is stopped
successfully. Other values indicate that the queue stops fail.
If dev_timeout is not set, the value of dev_state is 0;

dev_timeout: if the queue fails to stop, the queue is released
after waiting dev_timeout * 20ms.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/qm - add stop function by hardware
Weili Qian [Wed, 7 Feb 2024 09:50:59 +0000 (17:50 +0800)]
crypto: hisilicon/qm - add stop function by hardware

Hardware V3 could be able to drain function by sending mailbox
to hardware which will trigger tasks in device to be flushed out.
When the function is reset, the function can be stopped by this way.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: ccp - State in dmesg that TSME is enabled
Borislav Petkov (AMD) [Mon, 5 Feb 2024 15:46:01 +0000 (16:46 +0100)]
crypto: ccp - State in dmesg that TSME is enabled

In the case when only TSME is enabled, it is useful to state that fact
too, so that users are aware that memory encryption is still enabled
even when the corresponding software variant of memory encryption is not
enabled.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: rsa - restrict plaintext/ciphertext values more
Joachim Vandersmissen [Sat, 3 Feb 2024 07:19:59 +0000 (01:19 -0600)]
crypto: rsa - restrict plaintext/ciphertext values more

SP 800-56Br2, Section 7.1.1 [1] specifies that:
1. If m does not satisfy 1 < m < (n – 1), output an indication that m is
out of range, and exit without further processing.

Similarly, Section 7.1.2 of the same standard specifies that:
1. If the ciphertext c does not satisfy 1 < c < (n – 1), output an
indication that the ciphertext is out of range, and exit without further
processing.

This range is slightly more conservative than RFC3447, as it also
excludes RSA fixed points 0, 1, and n - 1.

[1] https://doi.org/10.6028/NIST.SP.800-56Br2

Signed-off-by: Joachim Vandersmissen <git@jvdsn.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - improve aer error reset handling
Mun Chun Yep [Fri, 2 Feb 2024 10:53:24 +0000 (18:53 +0800)]
crypto: qat - improve aer error reset handling

Rework the AER reset and recovery flow to take into account root port
integrated devices that gets reset between the error detected and the
slot reset callbacks.

In adf_error_detected() the devices is gracefully shut down. The worker
threads are disabled, the error conditions are notified to listeners and
through PFVF comms and finally the device is reset as part of
adf_dev_down().

In adf_slot_reset(), the device is brought up again. If SRIOV VFs were
enabled before reset, these are re-enabled and VFs are notified of
restarting through PFVF comms.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - limit heartbeat notifications
Furong Zhou [Fri, 2 Feb 2024 10:53:23 +0000 (18:53 +0800)]
crypto: qat - limit heartbeat notifications

When the driver detects an heartbeat failure, it starts the recovery
flow. Set a limit so that the number of events is limited in case the
heartbeat status is read too frequently.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - add auto reset on error
Damian Muszynski [Fri, 2 Feb 2024 10:53:22 +0000 (18:53 +0800)]
crypto: qat - add auto reset on error

Expose the `auto_reset` sysfs attribute to configure the driver to reset
the device when a fatal error is detected.

When auto reset is enabled, the driver resets the device when it detects
either an heartbeat failure or a fatal error through an interrupt.

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - add fatal error notification
Mun Chun Yep [Fri, 2 Feb 2024 10:53:21 +0000 (18:53 +0800)]
crypto: qat - add fatal error notification

Notify a fatal error condition and optionally reset the device in
the following cases:
  * if the device reports an uncorrectable fatal error through an
    interrupt
  * if the heartbeat feature detects that the device is not
    responding

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - re-enable sriov after pf reset
Mun Chun Yep [Fri, 2 Feb 2024 10:53:20 +0000 (18:53 +0800)]
crypto: qat - re-enable sriov after pf reset

When a Physical Function (PF) is reset, SR-IOV gets disabled, making the
associated Virtual Functions (VFs) unavailable. Even after reset and
using pci_restore_state, VFs remain uncreated because the numvfs still
at 0. Therefore, it's necessary to reconfigure SR-IOV to re-enable VFs.

This commit introduces the ADF_SRIOV_ENABLED configuration flag to cache
the SR-IOV enablement state. SR-IOV is only re-enabled if it was
previously configured.

This commit also introduces a dedicated workqueue without
`WQ_MEM_RECLAIM` flag for enabling SR-IOV during Heartbeat and CPM error
resets, preventing workqueue flushing warning.

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - update PFVF protocol for recovery
Mun Chun Yep [Fri, 2 Feb 2024 10:53:19 +0000 (18:53 +0800)]
crypto: qat - update PFVF protocol for recovery

Update the PFVF logic to handle restart and recovery. This adds the
following functions:

  * adf_pf2vf_notify_fatal_error(): allows the PF to notify VFs that the
    device detected a fatal error and requires a reset. This sends to
    VF the event `ADF_PF2VF_MSGTYPE_FATAL_ERROR`.
  * adf_pf2vf_wait_for_restarting_complete(): allows the PF to wait for
    `ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE` events from active VFs
    before proceeding with a reset.
  * adf_pf2vf_notify_restarted(): enables the PF to notify VFs with
    an `ADF_PF2VF_MSGTYPE_RESTARTED` event after recovery, indicating that
    the device is back to normal. This prompts VF drivers switch back to
    use the accelerator for workload processing.

These changes improve the communication and synchronization between PF
and VF drivers during system restart and recovery processes.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - disable arbitration before reset
Furong Zhou [Fri, 2 Feb 2024 10:53:18 +0000 (18:53 +0800)]
crypto: qat - disable arbitration before reset

Disable arbitration to avoid new requests to be processed before
resetting a device.

This is needed so that new requests are not fetched when an error is
detected.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - add fatal error notify method
Furong Zhou [Fri, 2 Feb 2024 10:53:17 +0000 (18:53 +0800)]
crypto: qat - add fatal error notify method

Add error notify method to report a fatal error event to all the
subsystems registered. In addition expose an API,
adf_notify_fatal_error(), that allows to trigger a fatal error
notification asynchronously in the context of a workqueue.

This will be invoked when a fatal error is detected by the ISR or
through Heartbeat.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - add heartbeat error simulator
Damian Muszynski [Fri, 2 Feb 2024 10:53:16 +0000 (18:53 +0800)]
crypto: qat - add heartbeat error simulator

Add a mechanism that allows to inject a heartbeat error for testing
purposes.
A new attribute `inject_error` is added to debugfs for each QAT device.
Upon a write on this attribute, the driver will inject an error on the
device which can then be detected by the heartbeat feature.
Errors are breaking the device functionality thus they require a
device reset in order to be recovered.

This functionality is not compiled by default, to enable it
CRYPTO_DEV_QAT_ERROR_INJECTION must be set.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: virtio - remove duplicate check if queue is broken
Li RongQing [Thu, 1 Feb 2024 06:17:16 +0000 (14:17 +0800)]
crypto: virtio - remove duplicate check if queue is broken

virtqueue_enable_cb() will call virtqueue_poll() which will check if
queue is broken at beginning, so remove the virtqueue_is_broken() call

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: xilinx - call finalize with bh disabled
Quanyang Wang [Sun, 28 Jan 2024 04:29:06 +0000 (12:29 +0800)]
crypto: xilinx - call finalize with bh disabled

When calling crypto_finalize_request, BH should be disabled to avoid
triggering the following calltrace:

    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 74 at crypto/crypto_engine.c:58 crypto_finalize_request+0xa0/0x118
    Modules linked in: cryptodev(O)
    CPU: 2 PID: 74 Comm: firmware:zynqmp Tainted: G           O       6.8.0-rc1-yocto-standard #323
    Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
    pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : crypto_finalize_request+0xa0/0x118
    lr : crypto_finalize_request+0x104/0x118
    sp : ffffffc085353ce0
    x29: ffffffc085353ce0 x28: 0000000000000000 x27: ffffff8808ea8688
    x26: ffffffc081715038 x25: 0000000000000000 x24: ffffff880100db00
    x23: ffffff880100da80 x22: 0000000000000000 x21: 0000000000000000
    x20: ffffff8805b14000 x19: ffffff880100da80 x18: 0000000000010450
    x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    x14: 0000000000000003 x13: 0000000000000000 x12: ffffff880100dad0
    x11: 0000000000000000 x10: ffffffc0832dcd08 x9 : ffffffc0812416d8
    x8 : 00000000000001f4 x7 : ffffffc0830d2830 x6 : 0000000000000001
    x5 : ffffffc082091000 x4 : ffffffc082091658 x3 : 0000000000000000
    x2 : ffffffc7f9653000 x1 : 0000000000000000 x0 : ffffff8802d20000
    Call trace:
     crypto_finalize_request+0xa0/0x118
     crypto_finalize_aead_request+0x18/0x30
     zynqmp_handle_aes_req+0xcc/0x388
     crypto_pump_work+0x168/0x2d8
     kthread_worker_fn+0xfc/0x3a0
     kthread+0x118/0x138
     ret_from_fork+0x10/0x20
    irq event stamp: 40
    hardirqs last  enabled at (39): [<ffffffc0812416f8>] _raw_spin_unlock_irqrestore+0x70/0xb0
    hardirqs last disabled at (40): [<ffffffc08122d208>] el1_dbg+0x28/0x90
    softirqs last  enabled at (36): [<ffffffc080017dec>] kernel_neon_begin+0x8c/0xf0
    softirqs last disabled at (34): [<ffffffc080017dc0>] kernel_neon_begin+0x60/0xf0
    ---[ end trace 0000000000000000 ]---

Fixes: 4d96f7d48131 ("crypto: xilinx - Add Xilinx AES driver")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: ahash - unexport crypto_hash_alg_has_setkey()
Eric Biggers [Sat, 27 Jan 2024 07:49:27 +0000 (23:49 -0800)]
crypto: ahash - unexport crypto_hash_alg_has_setkey()

Since crypto_hash_alg_has_setkey() is only called from ahash.c itself,
make it a static function.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/sec - remove unused parameter
Wenkai Lin [Fri, 26 Jan 2024 09:38:28 +0000 (17:38 +0800)]
crypto: hisilicon/sec - remove unused parameter

Unused parameter of static functions should be removed.

Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/sec2 - fix some cleanup issues
Qi Tao [Fri, 26 Jan 2024 09:38:27 +0000 (17:38 +0800)]
crypto: hisilicon/sec2 - fix some cleanup issues

This patch fixes following cleanup issues:
 - The return value of the function is
   inconsistent with the actual return type.
 - After the pointer type is directly converted
   to the `__le64` type, the program may crash
   or produce unexpected results.

Signed-off-by: Qi Tao <taoqi10@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/sec2 - modify nested macro call
Qi Tao [Fri, 26 Jan 2024 09:38:26 +0000 (17:38 +0800)]
crypto: hisilicon/sec2 - modify nested macro call

Nested macros are integrated into a single macro,
making the code simpler.

Signed-off-by: Qi Tao <taoqi10@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/sec2 - updates the sec DFX function register
Qi Tao [Fri, 26 Jan 2024 09:38:25 +0000 (17:38 +0800)]
crypto: hisilicon/sec2 - updates the sec DFX function register

As the sec DFX function is enhanced, some RAS registers are added
to the original DFX registers to enhance the DFX positioning function.

Signed-off-by: Qi Tao <taoqi10@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agodt-bindings: crypto: ice: Document SC7180 inline crypto engine
David Wronek [Sun, 21 Jan 2024 16:57:41 +0000 (17:57 +0100)]
dt-bindings: crypto: ice: Document SC7180 inline crypto engine

Document the compatible used for the inline crypto engine found on
SC7180.

Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David Wronek <davidwronek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: testmgr - remove unused xts4096 and xts512 algorithms from testmgr.c
Joachim Vandersmissen [Sun, 21 Jan 2024 19:45:26 +0000 (13:45 -0600)]
crypto: testmgr - remove unused xts4096 and xts512 algorithms from testmgr.c

Commit a93492cae30a ("crypto: ccree - remove data unit size support")
removed support for the xts512 and xts4096 algorithms, but left them
defined in testmgr.c. This patch removes those definitions.

Signed-off-by: Joachim Vandersmissen <git@jvdsn.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - use kcalloc_node() instead of kzalloc_node()
Erick Archer [Sun, 21 Jan 2024 16:40:43 +0000 (17:40 +0100)]
crypto: qat - use kcalloc_node() instead of kzalloc_node()

As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

So, use the purpose specific kcalloc_node() function instead of the
argument count * size in the kzalloc_node() function.

Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: sun8i-ce - Use kcalloc() instead of kzalloc()
Erick Archer [Sun, 21 Jan 2024 15:34:07 +0000 (16:34 +0100)]
crypto: sun8i-ce - Use kcalloc() instead of kzalloc()

As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

So, use the purpose specific kcalloc() function instead of the argument
size * count in the kzalloc() function.

Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon - Fix smp_processor_id() warnings
Wenkai Lin [Fri, 19 Jan 2024 08:11:07 +0000 (16:11 +0800)]
crypto: hisilicon - Fix smp_processor_id() warnings

Switch to raw_smp_processor_id() to prevent a number of
warnings from kernel debugging. We do not care about
preemption here, as the CPU number is only used as a
poor mans load balancing or device selection. If preemption
happens during an encrypt/decrypt operation a small performance
hit will occur but everything will continue to work, so just
ignore it. This commit is similar to e7a9b05ca4
("crypto: cavium - Fix smp_processor_id() warnings").

[ 7538.874350] BUG: using smp_processor_id() in preemptible [00000000] code: af_alg06/8438
[ 7538.874368] caller is debug_smp_processor_id+0x1c/0x28
[ 7538.874373] CPU: 50 PID: 8438 Comm: af_alg06 Kdump: loaded Not tainted 5.10.0.pc+ #18
[ 7538.874377] Call trace:
[ 7538.874387]  dump_backtrace+0x0/0x210
[ 7538.874389]  show_stack+0x2c/0x38
[ 7538.874392]  dump_stack+0x110/0x164
[ 7538.874394]  check_preemption_disabled+0xf4/0x108
[ 7538.874396]  debug_smp_processor_id+0x1c/0x28
[ 7538.874406]  sec_create_qps+0x24/0xe8 [hisi_sec2]
[ 7538.874408]  sec_ctx_base_init+0x20/0x4d8 [hisi_sec2]
[ 7538.874411]  sec_aead_ctx_init+0x68/0x180 [hisi_sec2]
[ 7538.874413]  sec_aead_sha256_ctx_init+0x28/0x38 [hisi_sec2]
[ 7538.874421]  crypto_aead_init_tfm+0x54/0x68
[ 7538.874423]  crypto_create_tfm_node+0x6c/0x110
[ 7538.874424]  crypto_alloc_tfm_node+0x74/0x288
[ 7538.874426]  crypto_alloc_aead+0x40/0x50
[ 7538.874431]  aead_bind+0x50/0xd0
[ 7538.874433]  alg_bind+0x94/0x148
[ 7538.874439]  __sys_bind+0x98/0x118
[ 7538.874441]  __arm64_sys_bind+0x28/0x38
[ 7538.874445]  do_el0_svc+0x88/0x258
[ 7538.874447]  el0_svc+0x1c/0x28
[ 7538.874449]  el0_sync_handler+0x8c/0xb8
[ 7538.874452]  el0_sync+0x148/0x180

Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Merge finalization into en/decrypt asm helpers
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:37 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Merge finalization into en/decrypt asm helpers

The C glue code already infers whether or not the current iteration is
the final one, by comparing walk.nbytes with walk.total. This means we
can easily inform the asm helpers of this as well, by conditionally
passing a pointer to the original IV, which is used in the finalization
of the MAC. This removes the need for a separate call into the asm code
to perform the finalization.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Merge encrypt and decrypt tail handling
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:36 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Merge encrypt and decrypt tail handling

The encryption and decryption code paths are mostly identical, except
for a small difference where the plaintext input into the MAC is taken
from either the input or the output block.

We can factor this in quite easily using a vector bit select, and a few
additional XORs, without the need for branches. This way, we can use the
same tail handling logic on the encrypt and decrypt code paths, allowing
further consolidation of the asm helpers in a subsequent patch.

(In the main loop, adding just a handful of ALU instructions results in
a noticeable performance hit [around 5% on Apple M2], so those routines
are kept separate)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Cache round keys and unroll AES loops
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:35 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Cache round keys and unroll AES loops

The CCM code as originally written attempted to use as few NEON
registers as possible, to avoid having to eagerly preserve/restore the
entire NEON register file at every call to kernel_neon_begin/end. At
that time, this API took a number of NEON registers as a parameter, and
only preserved that many registers.

Today, the NEON register file is restored lazily, and the old API is
long gone. This means we can use as many NEON registers as we can make
meaningful use of, which means in the AES case that we can keep all
round keys in registers rather than reloading each of them for each AES
block processed.

On Cortex-A53, this results in a speedup of more than 50%. (From 4
cycles per byte to 2.6 cycles per byte)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Reuse existing MAC update for AAD input
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:34 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Reuse existing MAC update for AAD input

CCM combines the counter (CTR) encryption mode with a MAC based on the
same block cipher. This MAC construction is a bit clunky: it invokes the
block cipher in a way that cannot be parallelized, resulting in poor CPU
pipeline efficiency.

The arm64 CCM code mitigates this by interleaving the encryption and MAC
at the AES round level, resulting in a substantial speedup. But this
approach does not apply to the additional authenticated data (AAD) which
is not encrypted.

This means the special asm routine dealing with the AAD is not any
better than the MAC update routine used by the arm64 AES block
encryption driver, so let's reuse that, and drop the special AES-CCM
version.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Replace bytewise tail handling with NEON permute
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:33 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Replace bytewise tail handling with NEON permute

Implement the CCM tail handling using a single sequence that uses
permute vectors and overlapping loads and stores, rather than going over
the tail byte by byte in a loop, and using scalar operations. This is
more efficient, even though the measured speedup is only around 1-2% on
the CPUs I have tried.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Pass short inputs via stack buffer
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:32 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Pass short inputs via stack buffer

In preparation for optimizing the CCM core asm code using permutation
vectors and overlapping loads and stores, ensure that inputs shorter
than the size of a AES block are passed via a buffer on the stack, in a
way that positions the data at the end of a 16 byte buffer. This removes
the need for the asm code to reason about a rare corner case where the
tail of the data cannot be read/written using a single NEON load/store
instruction.

While at it, tweak the copyright header and authorship to bring it up to
date.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Keep NEON enabled during skcipher walk
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:31 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Keep NEON enabled during skcipher walk

Now that kernel mode NEON no longer disables preemption, we no longer
have to take care to disable and re-enable use of the NEON when calling
into the skcipher walk API. So just keep it enabled until done.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: arm64/aes-ccm - Revert "Rewrite skcipher walker loop"
Ard Biesheuvel [Thu, 18 Jan 2024 17:06:30 +0000 (18:06 +0100)]
crypto: arm64/aes-ccm - Revert "Rewrite skcipher walker loop"

This reverts commit 57ead1bf1c54, which updated the CCM code to only
rely on walk.nbytes to check for failures returned from the skcipher
walk API, mostly for the common good rather than to fix a particular
problem in the code.

This change introduces a problem of its own: the skcipher walk is
started with the 'atomic' argument set to false, which means that the
skcipher walk API is permitted to sleep. Subsequently, it invokes
skcipher_walk_done() with preemption disabled on the final iteration of
the loop. This appears to work by accident, but it is arguably a bad
example, and providing a better example was the point of the original
patch.

Given that future changes to the CCM code will rely on the original
behavior of entering the loop even for zero sized inputs, let's just
revert this change entirely, and proceed from there.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: asymmetric_keys - remove redundant pointer secs
Colin Ian King [Thu, 18 Jan 2024 12:07:45 +0000 (12:07 +0000)]
crypto: asymmetric_keys - remove redundant pointer secs

The pointer secs is being assigned a value however secs is never
read afterwards. The pointer secs is redundant and can be removed.

Cleans up clang scan build warning:
warning: Although the value stored to 'secs' is used in the enclosing
expression, the value is never actually read from 'secs'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: pcbc - remove redundant assignment to nbytes
Colin Ian King [Tue, 16 Jan 2024 10:43:02 +0000 (10:43 +0000)]
crypto: pcbc - remove redundant assignment to nbytes

The assignment to nbytes is redundant, the while loop needs
to just refer to the value in walk.nbytes and the value of
nbytes is being re-assigned inside the loop on both paths
of the following if-statement.  Remove redundant assignment.

Cleans up clang scan build warning:
warning: Although the value stored to 'nbytes' is used in
the enclosing expression, the value is never actually read
from 'nbytes' [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/qm - dump important registers values before resetting
Weili Qian [Fri, 12 Jan 2024 10:25:46 +0000 (18:25 +0800)]
crypto: hisilicon/qm - dump important registers values before resetting

Read the values of some device registers before the device
is reset, these values help analyze the cause of the device exception.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: hisilicon/qm - support get device state
Weili Qian [Fri, 12 Jan 2024 10:25:45 +0000 (18:25 +0800)]
crypto: hisilicon/qm - support get device state

Support get device current state. The value 0 indicates that
the device is busy, and the value 1 indicates that the
device is idle. When the device is in suspended, 1 is returned.

Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: iaa - Remove unnecessary debugfs_create_dir() error check in iaa_crypto_debug...
Minjie Du [Tue, 9 Jan 2024 02:19:14 +0000 (10:19 +0800)]
crypto: iaa - Remove unnecessary debugfs_create_dir() error check in iaa_crypto_debugfs_init()

This patch removes the debugfs_create_dir() error checking in
iaa_crypto_debugfs_init(). Because the debugfs_create_dir() is developed
in a way that the caller can safely handle the errors that
occur during the creation of DebugFS nodes.

Signed-off-by: Minjie Du <duminjie@vivo.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: iaa - Remove header table code
Tom Zanussi [Mon, 8 Jan 2024 22:53:48 +0000 (16:53 -0600)]
crypto: iaa - Remove header table code

The header table and related code is currently unused - it was
included and used for canned mode, but canned mode has been removed,
so this code can be safely removed as well.

This indirectly fixes a bug reported by Dan Carpenter.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-crypto/b2e0bd974981291e16882686a2b9b1db3986abe4.camel@linux.intel.com/T/#m4403253d6a4347a925fab4fc1cdb4ef7c095fb86
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agoKEYS: include header for EINVAL definition
Clay Chang [Sun, 7 Jan 2024 13:28:42 +0000 (21:28 +0800)]
KEYS: include header for EINVAL definition

This patch includes linux/errno.h to address the issue of 'EINVAL' being
undeclared.

Signed-off-by: Clay Chang <clayc@hpe.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agodt-bindings: qcom-qce: Add compatible for SM6350
Luca Weiss [Fri, 5 Jan 2024 16:15:43 +0000 (17:15 +0100)]
dt-bindings: qcom-qce: Add compatible for SM6350

Add a compatible for the crypto block found on the SM6350 SoC.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - avoid memcpy() overflow warning
Arnd Bergmann [Wed, 3 Jan 2024 16:26:02 +0000 (17:26 +0100)]
crypto: qat - avoid memcpy() overflow warning

The use of array_size() leads gcc to assume the memcpy() can have a larger
limit than actually possible, which triggers a string fortification warning:

In file included from include/linux/string.h:296,
                 from include/linux/bitmap.h:12,
                 from include/linux/cpumask.h:12,
                 from include/linux/sched.h:16,
                 from include/linux/delay.h:23,
                 from include/linux/iopoll.h:12,
                 from drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:3:
In function 'fortify_memcpy_chk',
    inlined from 'adf_gen4_init_thd2arb_map' at drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:401:3:
include/linux/fortify-string.h:579:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
  579 |    __write_overflow_field(p_size_field, size);
      |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/fortify-string.h:588:4: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  588 |    __read_overflow2_field(q_size_field, size);
      |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Add an explicit range check to avoid this.

Fixes: 5da6a2d5353e ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: vmx - Move to arch/powerpc/crypto
Danny Tsen [Tue, 2 Jan 2024 20:58:56 +0000 (15:58 -0500)]
crypto: vmx - Move to arch/powerpc/crypto

Relocate all crypto files in vmx driver to arch/powerpc/crypto directory
and remove vmx directory.

drivers/crypto/vmx/aes.c rename to arch/powerpc/crypto/aes.c
drivers/crypto/vmx/aes_cbc.c rename to arch/powerpc/crypto/aes_cbc.c
drivers/crypto/vmx/aes_ctr.c rename to arch/powerpc/crypto/aes_ctr.c
drivers/crypto/vmx/aes_xts.c rename to arch/powerpc/crypto/aes_xts.c
drivers/crypto/vmx/aesp8-ppc.h rename to arch/powerpc/crypto/aesp8-ppc.h
drivers/crypto/vmx/aesp8-ppc.pl rename to arch/powerpc/crypto/aesp8-ppc.pl
drivers/crypto/vmx/ghash.c rename to arch/powerpc/crypto/ghash.c
drivers/crypto/vmx/ghashp8-ppc.pl rename to arch/powerpc/crypto/ghashp8-ppc.pl
drivers/crypto/vmx/vmx.c rename to arch/powerpc/crypto/vmx.c

deleted files:
drivers/crypto/vmx/Makefile
drivers/crypto/vmx/Kconfig
drivers/crypto/vmx/ppc-xlate.pl

This patch has been tested has passed the selftest.  The patch is also tested with
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS enabled.

Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: virtio - Less function calls in __virtio_crypto_akcipher_do_req() after error...
Markus Elfring [Tue, 26 Dec 2023 10:00:20 +0000 (11:00 +0100)]
crypto: virtio - Less function calls in __virtio_crypto_akcipher_do_req() after error detection

The kfree() function was called in up to two cases by the
__virtio_crypto_akcipher_do_req() function during error handling
even if the passed variable contained a null pointer.
This issue was detected by using the Coccinelle software.

* Adjust jump targets.

* Delete two initialisations which became unnecessary
  with this refactoring.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: caam - fix asynchronous hash
Gaurav Jain [Thu, 18 Jan 2024 09:25:57 +0000 (14:55 +0530)]
crypto: caam - fix asynchronous hash

ahash_alg->setkey is updated to ahash_nosetkey in ahash.c
so checking setkey() function to determine hmac algorithm is not valid.

to fix this added is_hmac variable in structure caam_hash_alg to determine
whether the algorithm is hmac or not.

Fixes: 2f1f34c1bf7b ("crypto: ahash - optimize performance when wrapping shash")
Signed-off-by: Gaurav Jain <gaurav.jain@nxp.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
20 months agocrypto: qat - fix arbiter mapping generation algorithm for QAT 402xx
Damian Muszynski [Fri, 19 Jan 2024 16:12:38 +0000 (17:12 +0100)]
crypto: qat - fix arbiter mapping generation algorithm for QAT 402xx

The commit "crypto: qat - generate dynamically arbiter mappings"
introduced a regression on qat_402xx devices.
This is reported when the driver probes the device, as indicated by
the following error messages:

  4xxx 0000:0b:00.0: enabling device (0140 -> 0142)
  4xxx 0000:0b:00.0: Generate of the thread to arbiter map failed
  4xxx 0000:0b:00.0: Direct firmware load for qat_402xx_mmp.bin failed with error -2

The root cause of this issue was the omission of a necessary function
pointer required by the mapping algorithm during the implementation.
Fix it by adding the missing function pointer.

Fixes: 5da6a2d5353e ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
21 months agoLinux 6.8-rc1
Linus Torvalds [Sun, 21 Jan 2024 22:11:32 +0000 (14:11 -0800)]
Linux 6.8-rc1

21 months agoMerge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Sun, 21 Jan 2024 22:01:12 +0000 (14:01 -0800)]
Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs updates from Kent Overstreet:
 "Some fixes, Some refactoring, some minor features:

   - Assorted prep work for disk space accounting rewrite

   - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
     makes our trigger context more explicit

   - A few fixes to avoid excessive transaction restarts on
     multithreaded workloads: fstests (in addition to ktest tests) are
     now checking slowpath counters, and that's shaking out a few bugs

   - Assorted tracepoint improvements

   - Starting to break up bcachefs_format.h and move on disk types so
     they're with the code they belong to; this will make room to start
     documenting the on disk format better.

   - A few minor fixes"

* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
  bcachefs: Improve inode_to_text()
  bcachefs: logged_ops_format.h
  bcachefs: reflink_format.h
  bcachefs; extents_format.h
  bcachefs: ec_format.h
  bcachefs: subvolume_format.h
  bcachefs: snapshot_format.h
  bcachefs: alloc_background_format.h
  bcachefs: xattr_format.h
  bcachefs: dirent_format.h
  bcachefs: inode_format.h
  bcachefs; quota_format.h
  bcachefs: sb-counters_format.h
  bcachefs: counters.c -> sb-counters.c
  bcachefs: comment bch_subvolume
  bcachefs: bch_snapshot::btime
  bcachefs: add missing __GFP_NOWARN
  bcachefs: opts->compression can now also be applied in the background
  bcachefs: Prep work for variable size btree node buffers
  bcachefs: grab s_umount only if snapshotting
  ...

21 months agoMerge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Jan 2024 19:14:40 +0000 (11:14 -0800)]
Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for time and clocksources:

   - A fix for the idle and iowait time accounting vs CPU hotplug.

     The time is reset on CPU hotplug which makes the accumulated
     systemwide time jump backwards.

   - Assorted fixes and improvements for clocksource/event drivers"

* tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
  clocksource/drivers/ep93xx: Fix error handling during probe
  clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings
  clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings
  clocksource/timer-riscv: Add riscv_clock_shutdown callback
  dt-bindings: timer: Add StarFive JH8100 clint
  dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs

21 months agoMerge tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 21 Jan 2024 19:04:29 +0000 (11:04 -0800)]
Merge tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Aneesh Kumar:

 - Increase default stack size to 32KB for Book3S

Thanks to Michael Ellerman.

* tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Increase default stack size to 32KB

21 months agobcachefs: Improve inode_to_text()
Kent Overstreet [Sun, 21 Jan 2024 17:19:01 +0000 (12:19 -0500)]
bcachefs: Improve inode_to_text()

Add line breaks - inode_to_text() is now much easier to read.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: logged_ops_format.h
Kent Overstreet [Sun, 21 Jan 2024 07:57:45 +0000 (02:57 -0500)]
bcachefs: logged_ops_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: reflink_format.h
Kent Overstreet [Sun, 21 Jan 2024 07:54:47 +0000 (02:54 -0500)]
bcachefs: reflink_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs; extents_format.h
Kent Overstreet [Sun, 21 Jan 2024 07:51:56 +0000 (02:51 -0500)]
bcachefs; extents_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: ec_format.h
Kent Overstreet [Sun, 21 Jan 2024 07:47:14 +0000 (02:47 -0500)]
bcachefs: ec_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: subvolume_format.h
Kent Overstreet [Sun, 21 Jan 2024 07:42:53 +0000 (02:42 -0500)]
bcachefs: subvolume_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: snapshot_format.h
Kent Overstreet [Sun, 21 Jan 2024 07:41:06 +0000 (02:41 -0500)]
bcachefs: snapshot_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: alloc_background_format.h
Kent Overstreet [Sun, 21 Jan 2024 05:01:52 +0000 (00:01 -0500)]
bcachefs: alloc_background_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: xattr_format.h
Kent Overstreet [Sun, 21 Jan 2024 04:59:15 +0000 (23:59 -0500)]
bcachefs: xattr_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: dirent_format.h
Kent Overstreet [Sun, 21 Jan 2024 04:57:10 +0000 (23:57 -0500)]
bcachefs: dirent_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: inode_format.h
Kent Overstreet [Sun, 21 Jan 2024 04:55:39 +0000 (23:55 -0500)]
bcachefs: inode_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs; quota_format.h
Kent Overstreet [Sun, 21 Jan 2024 04:53:52 +0000 (23:53 -0500)]
bcachefs; quota_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: sb-counters_format.h
Kent Overstreet [Sun, 21 Jan 2024 04:50:56 +0000 (23:50 -0500)]
bcachefs: sb-counters_format.h

bcachefs_format.h has gotten too big; let's do some organizing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: counters.c -> sb-counters.c
Kent Overstreet [Sun, 21 Jan 2024 04:46:35 +0000 (23:46 -0500)]
bcachefs: counters.c -> sb-counters.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: comment bch_subvolume
Kent Overstreet [Sun, 21 Jan 2024 04:44:17 +0000 (23:44 -0500)]
bcachefs: comment bch_subvolume

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch_snapshot::btime
Kent Overstreet [Sun, 21 Jan 2024 04:35:41 +0000 (23:35 -0500)]
bcachefs: bch_snapshot::btime

Add a field to bch_snapshot for creation time; this will be important
when we start exposing the snapshot tree to userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: add missing __GFP_NOWARN
Kent Overstreet [Wed, 17 Jan 2024 22:16:07 +0000 (17:16 -0500)]
bcachefs: add missing __GFP_NOWARN

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: opts->compression can now also be applied in the background
Kent Overstreet [Tue, 16 Jan 2024 21:20:21 +0000 (16:20 -0500)]
bcachefs: opts->compression can now also be applied in the background

The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Prep work for variable size btree node buffers
Kent Overstreet [Tue, 16 Jan 2024 18:29:59 +0000 (13:29 -0500)]
bcachefs: Prep work for variable size btree node buffers

bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.

And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.

Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: grab s_umount only if snapshotting
Su Yue [Mon, 15 Jan 2024 02:21:25 +0000 (10:21 +0800)]
bcachefs: grab s_umount only if snapshotting

When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e98 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit
Su Yue [Tue, 16 Jan 2024 11:05:37 +0000 (19:05 +0800)]
bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit

bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut.
It should be freed by kvfree not kfree.
Or umount will triger:

[  406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008
[  406.830676 ] #PF: supervisor read access in kernel mode
[  406.831643 ] #PF: error_code(0x0000) - not-present page
[  406.832487 ] PGD 0 P4D 0
[  406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI
[  406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G           OE      6.7.0-rc7-custom+ #90
[  406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[  406.835796 ] RIP: 0010:kfree+0x62/0x140
[  406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6
[  406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286
[  406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4
[  406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000
[  406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001
[  406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80
[  406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000
[  406.840451 ] FS:  00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000
[  406.840851 ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0
[  406.841464 ] Call Trace:
[  406.841583 ]  <TASK>
[  406.841682 ]  ? __die+0x1f/0x70
[  406.841828 ]  ? page_fault_oops+0x159/0x470
[  406.842014 ]  ? fixup_exception+0x22/0x310
[  406.842198 ]  ? exc_page_fault+0x1ed/0x200
[  406.842382 ]  ? asm_exc_page_fault+0x22/0x30
[  406.842574 ]  ? bch2_fs_release+0x54/0x280 [bcachefs]
[  406.842842 ]  ? kfree+0x62/0x140
[  406.842988 ]  ? kfree+0x104/0x140
[  406.843138 ]  bch2_fs_release+0x54/0x280 [bcachefs]
[  406.843390 ]  kobject_put+0xb7/0x170
[  406.843552 ]  deactivate_locked_super+0x2f/0xa0
[  406.843756 ]  cleanup_mnt+0xba/0x150
[  406.843917 ]  task_work_run+0x59/0xa0
[  406.844083 ]  exit_to_user_mode_prepare+0x197/0x1a0
[  406.844302 ]  syscall_exit_to_user_mode+0x16/0x40
[  406.844510 ]  do_syscall_64+0x4e/0xf0
[  406.844675 ]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  406.844907 ] RIP: 0033:0x7f0a2664e4fb

Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bios must be 512 byte algined
Kent Overstreet [Tue, 16 Jan 2024 16:38:04 +0000 (11:38 -0500)]
bcachefs: bios must be 512 byte algined

Fixes: 023f9ac9f70f bcachefs: Delete dio read alignment check
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: remove redundant variable tmp
Colin Ian King [Tue, 16 Jan 2024 11:07:23 +0000 (11:07 +0000)]
bcachefs: remove redundant variable tmp

The variable tmp is being assigned a value but it isn't being
read afterwards. The assignment is redundant and so tmp can be
removed.

Cleans up clang scan build warning:
warning: Although the value stored to 'ret' is used in the enclosing
expression, the value is never actually read from 'ret'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve trace_trans_restart_relock
Kent Overstreet [Tue, 16 Jan 2024 01:40:06 +0000 (20:40 -0500)]
bcachefs: Improve trace_trans_restart_relock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Fix excess transaction restarts in __bchfs_fallocate()
Kent Overstreet [Tue, 16 Jan 2024 01:37:23 +0000 (20:37 -0500)]
bcachefs: Fix excess transaction restarts in __bchfs_fallocate()

drop_locks_do() should not be used in a fastpath without first trying
the do in nonblocking mode - the unlock and relock will cause excessive
transaction restarts and potentially livelocking with other threads that
are contending for the same locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: extents_to_bp_state
Kent Overstreet [Mon, 15 Jan 2024 23:19:52 +0000 (18:19 -0500)]
bcachefs: extents_to_bp_state

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bkey_and_val_eq()
Kent Overstreet [Mon, 15 Jan 2024 23:08:32 +0000 (18:08 -0500)]
bcachefs: bkey_and_val_eq()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Better journal tracepoints
Kent Overstreet [Mon, 15 Jan 2024 22:59:51 +0000 (17:59 -0500)]
bcachefs: Better journal tracepoints

Factor out bch2_journal_bufs_to_text(), and use it in the
journal_entry_full() tracepoint; when we can't get a journal reservation
we need to know the outstanding journal entry sizes to know if the
problem is due to excessive flushing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Print size of superblock with space allocated
Kent Overstreet [Mon, 15 Jan 2024 22:57:44 +0000 (17:57 -0500)]
bcachefs: Print size of superblock with space allocated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Avoid flushing the journal in the discard path
Kent Overstreet [Mon, 15 Jan 2024 22:56:22 +0000 (17:56 -0500)]
bcachefs: Avoid flushing the journal in the discard path

When issuing discards, we may need to flush the journal if there's too
many buckets that can't be discarded until a journal flush.

But the heuristic was bad; we should be comparing the number of buckets
that need to flushes against the number of free buckets, not the number
of buckets we saw.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve move_extent tracepoint
Kent Overstreet [Mon, 15 Jan 2024 20:33:39 +0000 (15:33 -0500)]
bcachefs: Improve move_extent tracepoint

Also print out the data_opts, so that we can see what specifically is
being done to an extent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Add missing bch2_moving_ctxt_flush_all()
Kent Overstreet [Mon, 15 Jan 2024 20:06:43 +0000 (15:06 -0500)]
bcachefs: Add missing bch2_moving_ctxt_flush_all()

This fixes a bug with rebalance IOs getting stuck with reads completed,
but writes never being issued.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Re-add move_extent_write tracepoint
Kent Overstreet [Mon, 15 Jan 2024 20:04:40 +0000 (15:04 -0500)]
bcachefs: Re-add move_extent_write tracepoint

It appears this was accidentally deleted at some point - also, do a bit
of cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount
Kent Overstreet [Mon, 15 Jan 2024 19:15:26 +0000 (14:15 -0500)]
bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount

Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code
that uses it to be woken up for other reasons, and fixes a bug where
rebalance wouldn't wake up when a scan was requested.

This raises the possibility of spurious wakeups, but callers should
always be able to handle that reasonably well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Add .val_to_text() for KEY_TYPE_cookie
Kent Overstreet [Mon, 15 Jan 2024 19:15:03 +0000 (14:15 -0500)]
bcachefs: Add .val_to_text() for KEY_TYPE_cookie

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Don't pass memcmp() as a pointer
Kent Overstreet [Mon, 15 Jan 2024 19:12:43 +0000 (14:12 -0500)]
bcachefs: Don't pass memcmp() as a pointer

Some (buggy!) compilers have issues with this.

Fixes: https://github.com/koverstreet/bcachefs/issues/625
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agoMerge tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Sun, 21 Jan 2024 18:21:43 +0000 (10:21 -0800)]
Merge tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefs

Pull header fix from Kent Overstreet:
 "Just one small fixup for the RT build"

* tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefs:
  spinlock: Fix failing build for PREEMPT_RT

21 months agobcachefs: Reduce would_deadlock restarts
Kent Overstreet [Thu, 11 Jan 2024 04:47:04 +0000 (23:47 -0500)]
bcachefs: Reduce would_deadlock restarts

We don't have to take locks in any particular ordering - we'll make
forward progress just fine - but if we try to stick to an ordering, it
can help to avoid excessive would_deadlock transaction restarts.

This tweaks the reflink path to take extents btree locks in the right
order.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_trans_account_disk_usage_change()
Kent Overstreet [Sat, 11 Nov 2023 20:08:36 +0000 (15:08 -0500)]
bcachefs: bch2_trans_account_disk_usage_change()

The disk space accounting rewrite is splitting out accounting for each
replicas set - those are moving to btree keys, instead of percpu
counters.

This breaks bch2_trans_fs_usage_apply() up, splitting out the part we
will still need.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch_fs_usage_base
Kent Overstreet [Fri, 17 Nov 2023 05:03:45 +0000 (00:03 -0500)]
bcachefs: bch_fs_usage_base

Split out base filesystem usage into its own type; prep work for
breaking up bch2_trans_fs_usage_apply().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_prt_compression_type()
Kent Overstreet [Sun, 7 Jan 2024 02:01:47 +0000 (21:01 -0500)]
bcachefs: bch2_prt_compression_type()

bounds checking helper, since compression types are extensible

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: helpers for printing data types
Kent Overstreet [Sun, 7 Jan 2024 01:57:43 +0000 (20:57 -0500)]
bcachefs: helpers for printing data types

We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: BTREE_TRIGGER_ATOMIC
Kent Overstreet [Sun, 7 Jan 2024 22:14:46 +0000 (17:14 -0500)]
bcachefs: BTREE_TRIGGER_ATOMIC

Add a new flag to be explicit about when we're running atomic triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: drop to_text code for obsolete bps in alloc keys
Kent Overstreet [Sun, 7 Jan 2024 00:47:09 +0000 (19:47 -0500)]
bcachefs: drop to_text code for obsolete bps in alloc keys

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: eytzinger_for_each() declares loop iter
Kent Overstreet [Sun, 7 Jan 2024 00:29:14 +0000 (19:29 -0500)]
bcachefs: eytzinger_for_each() declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>